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ABSTBACT 


The  design  of  a  sixteen-bit  pipelined  adder  CMOS  inte- 
grated circuit  is  presented.  The  adder  is  designed  to 
maximize  throughput  and  to  provide  for  testability. 
Tutorial  material   on   CMOS   design  is   also   presented. 
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I-  IO1QDDCTI0N 

For  several  years  the  ability  of  systems  engineers  to 
design  custom  digital  integrated  circuits  has  been  growing. 
The  Mead  and  Conway  design  methodology  described  in 
Introduction  to  VLSI  Systems  [ Eef .  1 ]#  permits  the  systems 
engineer  to  be  his  own  logic  circuit  designer.  A  prolifera- 
tion of  computer-aided  design  (C&D)  systems  such  as  the 
MacPitts  silicon  compiler  [Eef.  2],  the  chip  layout  language 

(CLL)  [  Ref .  3],  the  graphics  editor  Caesar  [Ref.  4],  and  the 
Burlap  hierarchical  layout  language  [Ref.  5]  make  it 
possible  for  the  engineer  to  rapidly  carry  the  Mead  and 
Conway  design  methodology  through  to  a  final  design.  This 
includes  iterative  simulation  and  redesign  to  provide  justi- 
fiable confidence  in  the  final  design  submitted  for 
fabrication. 

Many  of  the  techniques  utilized  in  the  Mead  and  Conway 
methodology  and  most  of  the  CAD  tools  are  based  on  having 
the  final  design  implemented  in  a  technology  that  uses  only 
one  type  of  doping  for  the  semiconductor  material  in  the 
active  region  of  the  transistors.  Because  of  their  higher 
switching  speed,   negatively  doped  metal  oxide  semiconductor 

(NMOS)  transistor  technologies  are  generally  used. 

Selection  of  an  NMOS  implementation  technology  does 
provide  the  systems  engineer  with  a  complete  and  proven 
methodology  for  the  design  of   a  very  large  scale  integrated 

(VLSI)  circuit  and  allows  the  use  of  many  extensively  tested 
CAD  tools.  Like  any  other  design  decision,  selection  of 
NMOS  iiplementation  brings  with  it  some  limitations.  There 
are  two  primary  problems  associated  with  NMOS  digital 
circuits. 


The  first  is  the  ultimate  switching  speed  limitation. 
Though  many  NMOS  VLSI  circuits  operate  at  clock  rates  in  the 
8  to  10  MHz  range,  there  are  many  applications  requiring 
higher  clock  rates.  The  second  problem  is  the  dissipation 
of  the  relatively  large  amount  of  power  consumed  by  NMOS 
digital  circuits.  State  of  the  art,  commercially  available 
NMOS  VLSI  circuits  commonly  have  power  consumptions  in  the 
vicinity  of  3  to  5  watts.  Considerable  design  effort  is 
required  to  insure  that  the  dissipation  of  this  much  energy 
by  a  chip  measuring  approximately  5  millimeters  on  a  side 
does  not  alter  the  performance  of  the  micron  sized  features 
on  the  chip. 

One  group  of  technologies  that  offers  both  increased 
switching  speed  and  greatly  reduced  power  consumption  is 
complementary  metal  oxide  semiconductors  (CMOS) .  CMOS 
circuits  also  offer  the  benefits  of  greater  radiation  hard- 
ening and  increased  noise  margin.  In  this  thesis  investiga- 
tion, much  of  the  Mead  and  Conway  methodology  was  utilized 
in  the  design  of  a  CMOS  circuit.  A  general  purpose  color 
graphics  CAD  tool  called  Caesar  that  has  been  frequently 
used  in  the  design  of  NMOS  circuits  was  employed.  In 
carrying  out  the  design  of  the  16  bit  pipelined  high  speed 
adder  in  CMOS  two  separate  goals  were  pursued.  The  first, 
of  course,  is  speed  and  the  seccnd  is  verif iability.  A  high 
speed  adder  implies  not  only  a  high  clock  rate  of  operation 
but  also  a  small  latency  between  input  of  operands  and 
output  of  the  sum. 

A  discussion  of  CMOS  technologies  and  the  implementation 
of  logic  circuits  in  those  technologies  follows  in  Chapter 
2.  Chapter  3  presents  a  description  of  the  CAD  tools  used 
to  construct  and  simulate  the  layout  for  the  adder.  The 
logic  and  layout  design  of  the  adder  is  covered  in  Chapter  4 
and  is  followed  by  a  test  plan  for  the  fabricated  chip  in 
Chapter  5. 


II.    CMOS    CIBCaiTS 

Before  the  design  of  CMOS  digital  circuits  can  be 
attempted,  an  understanding  of  how  to  best  implement  logic 
functions  in  CMOS  is  necessary.  It  is  also  important  to  be 
aware  of  the  advantages  and  disadvantages  of  the  different 
CMOS  iiiplementation  technologies.  In  this  chapter  the  oper- 
ation of  CMOS  digital  circuits  is  explained  using  similar 
NMOS  circuits  as  a  benchmark  for  comparison.  The  different 
methodologies  for  assembling  the  CMOS  pieces  to  produce  the 
desired  logical  results  are  reviewed  and  the  selection  of 
the    CMOS-Bulk    p-well    implementation    technology    is    explained. 

A.       CCMPAEISON    WITH    NMOS 

In  NMOS  digital  circuits  there  is  only  one  type  of 
switching  device,  namely  the  n-channel  enhancement  mode 
metal  oxide  semiconductor  (MOS)  transistor.  The  other  prin- 
cipal device  utilized  in  NMOS  circuits  is  the  depletion  mode 
n-channel  MOS  device  which  acts  as  a  load  resistor.  In  CMOS 
there  are  both  n-channel  and  p-channel  enhancement  mode 
transistors  available.  As  in  NMOS,  the  n-channel  device  can 
be  considered  on  when  Vdd  (typically  +5  Volts  DC) ,  a  logical 
1,       is      present    on   its   gate.  The    p-channel  device      can    be 

considered  on  when  ground  (GND)  ,  a  logical  0,  is  present  on 
its  gate.  In  Figure  2. 1  are  the  symbols  that  will  be  used 
for    the    n-channel   and  p-channel    transistors   in    this    thesis. 

The  basic  differences  between  NMOS  and  CMOS  technologies 
can  be  demonstrated  by  comparing  their  application  to  some 
basic   digital   circuits. 
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Figure   2.  1        CMOS    Transistor   Symbols. 


1 


The    Inverter 


Figure  2.2  (a)  shows  an  NMOS  inverter.  Whenever 
there  is  a  logical  1  on  the  input,  the  voltage  drop  across 
the  lead  resistor  is  approximately  Vdd  and  the  output  is  a 
logical  0.  This  results  in  steady  state  power  consumption. 
When  the  input  switches  to  a  logical  0,  before  the  output 
can  assume  a  logical  1,  the  lead  capacitance  (CI)  on  the 
output  must  be  charged  to  Vdd  through  the  load  resistor  with 
a  resistance      of   several   kilohms.  This   results    in      a   much 

longer  transition  frcm  0  to  1 ,  where  the  load  capacitance  is 
charged  through  the  load  resistor,  than  from  1  to  0  where 
the  load  capacitance  is  discharged  through  the  switched  on 
NMOS  enhancement  transistor.  The  reason  for  this  asymmetry 
is  that  the  pull-down  transistcr's  on  resistance  is  typi- 
cally only  one  fourth  or  less  that  of  the  on  resistance  of 
the  pull-up  load  depletion  mode  transistor.  The  technique 
of  prechar^ing  circuits,  where  all  outputs  are  set  to 
logical  1  during  one  clock  cycle  and  then  selectively  forced 
to  0  on  the  opposite  (evaluation)  clock  cycle  has  proven 
helpful  in  gaining  control  over  the  unsymmetric  switching 
times.  This  longer  switching  time  from  0  to  1  must  still  be 
accounted  for,  however,  and  represents  the  primary  limita- 
tion   to   the   speed    of    NMOS   circuits. 
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Figure   2.2         (a)    HMOS  Inverter 


(b)    CMOS    Inverter. 


In  the  CMOS  inverter  of  Figure  2.2  (b)  the  input  is 
applied  to  the  gates  of  both  devices.  An  input  of  logical  1 
causes  the  n-channel  device  to  switch  on  and  the  p-channel 
device  to  switch  off,  resulting  in  an  output  of  logical  0. 
Similarly,  an  input  of  0  results  in  an  output  of  1.  In  both 
cases,  one  device  is  fully  off,  representing  a  resistance  on 
the      order    of      gigaohms.  Thus,         the    steady      state      power 

consumption  is  essentially  zero.  In  operation  the  only 
power  consumption  of  consequence  occurs  during  the  tran- 
sition when  neither  transistor  is  fully  on  or  off. 
Additionally,  since  the  output  load  capacitance  is  both 
charged  and  discharged  through  a  turned  on  transistor,  the  1 
to   0    and   0   to    1    switching   delays  are    theoretically   the    same. 

Actually  the  switching  delays  depend  on  many  parame- 
ters. The  n-channel  and  p-channel  device  dimensions  are 
frequently   not    the      same,      the    lobility    of      the    electrons  in 
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the  n-channel  is  greater  than  the  mobility  of  the  holes  in 
the  p-channel.  Also,  the  capacitive  load  seen  by  the 
p-channel  device  in  CMOS  p-well  (CMOS-pw)  is  greater  than 
the  load  seen  by  the  n-channel  device  because  of  the  highly 
doped  p-well.  Typically/  the  result  in  CMOS-pw  is  a 
slightly  longer  transition  time  of  the  0  to  1  output  tran- 
sition- Some  designers  attempt  to  compensate  for  this  by 
consistently  making  the  p-channel  transistors  wider  than  the 
n-channel  transistors. 

Unlike  NMOS,  the  output  of  a  CMOS  digital  circuit 
makes  a  full  excursion  between  Vdd  and  GND.  This  makes  CMOS 
circuits  less  sensitive  to  noise  than  NMOS  circuits.  CMOS 
should  also  benefit  more  from  future  reductions  in  feature 
size.  NMOS  is  more  restricted  in  ultimate  feature  size 
because  the  power  dissipation  requirements  of  the  depletion 
mode  devices  will  create  more  problems  as  feature  sizes 
shrink.  In  Figure  2.3  the  relative  sizes  of  minimum  dimen- 
sion inverters  implemented  in  currently  available  3  micron 
feature    size   CMOS-PW    and    NMOS    technologies   are    shown. 

2-      The   NOR    Gate    and    Transmission  Gate 

Figure  2.4  shows  the  circuit  diagrams  and  layouts  of 
a  two-input  NOR  gate  implemented  in  both  CMOS-PW  and  NMOS. 
From  Figures  2.3  and  2.4  it  is  evident  that  static1  CMOS 
gates  are  more  complex  and  area  consuming  than  their  NMOS 
counterparts.  In      these   fully      complementary      circuits      a 

redundancy  in  the  structures  is  evident.  The  pull-up  only 
or  pull-dcwn  only  would  be  sufficient  to  implement  the 
logic.  In  the  CMOS  circuits  of  Figures  2.3  and  2.4  the 
inputs  must  perform  two  tasks.  A  logical  1  on  an  input 
causes   both   a   connection   between   the   output  and    ground    and  a 


1  Static  logic  circuits  continuously  evaluate  their 
inputs  and  produce  their  specified  logic  output.  Dynamic 
circuits  periorm  logical  evaluation  of  the  inputs  only  when 
directed   to   do    so   by    control   signals   and/or   clock   signals. 
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Figure    2.3        Minimua   Dimension    Inverters. 

disconnection  between  the  output  and  Vdd.  Logically  these 
two  actions  are  equivalent,  therefore  only  one  action  should 
be  necessary  to  implement  the  logic.  Design  methodologies 
to  accomplish  this  are  described  in  section  B  of  this 
chapter.  The  parallelism  of  the  CMOS  transmission  gate  of 
Figure  2.5  and  the  NMOS  pass  transistor  is  evident.  The 
major  difference  lies  in  the  bilateral  nature  of  the  CMOS 
transmission  gate.  It   is      made   up      of   both      n-channel   and 

p-channel  devices  and  requires  both  polarities  of  the 
control  signal  for  operation.  The  reason  for  this  bilateral 
requirement  is  that  the  p-channel  device  does  not  transmit 
low    voltages    well   and   the    n-channel    device    does    not    transmit 
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Figure  2.4    2-iuput  Nor  Gate. 

high  voltages  well.  The  resulting  unpredictable  voltage 
drops  make  it  necessary  to  utilize  both  types  of  transis- 
tors. This  increase  in  complexity  over  its  NMOS  counterpart 
is  partially  offset  by  the  absence  of  the  level  restoring 
circuitry  NMOS  requires  following  a  pass  transistor.2 


2In  NMOS  digital  circuits  the  length  to  width  ratio  of 
the  pull  down  transistor  is  usually  four  times  that  of  the 
depletion  mode  transistor  load.  This  ratio  is  required  to 
insure  sufficient  excursion  of  the  output  voltage.  However, 
after  a  pass  transistor  is  used,  a  ratio  of  8:1  rather  than 
4: 1  must  be  used  to  restore  the  1GS  threshold  voltage  drop 
across  the  pass  transistor. 
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Figure    2.5        CMOS   Transmission    Gate. 

In  general  CMOS  technolcgies  are  ratioless.  The  use 
of  "improper"  ratios  will  not  affect  the  logical  operation 
of  most  CMOS  gates,  it  will  only  affect  the  speed  of  opera- 
tion   of    the    gates. 

B.       CMOS    DESIGN    METHODOLOGIES 

Static  gate  CMOS  circuits  have  three  serious  deficien- 
cies when  compared  to  static  NMOS  gates.  First,  they  are 
more  area  consuming.  Second,  they  can  be  slower.  Though 
the  individual  gates  can  be  faster  in  CMOS,  the  p-channel 
and  n-channel  gates  are  in  parallel,  thus,  the  fanout3  and 
the  output  load  capacitance  of  each  circuit  are  doubled 
Third,  a  CMOS  static  gate  is  redundant,  duplicating  its 
functionality    in    both    the    pull-up   and    pull-down    section. 

One  approach  to  remedy  these  deficiencies  is  to  use  a 
static  NMOS-like  style  of  design  as  in  Figure  2.6  Here  the 
p-channel  device  is  always  on  and  the  pull-up  to  pull-down 
dimension  ratio  is  relied  upon  to  produce  the  proper  output 
voltage.  This  introduces  power  consumption  problems  and 
takes      away    the      full     excursion      on    the      output.  Another 


3Fanout    represents      the    number      of    transistors      that    the 
output    of   a   logic    gate   must    drive. 
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Figure   2.6        NMOS-like    CMOS   Static   Gate     [Ref.    6]. 

approach  is  to  make  extensive  use  of  transmission  gates  to 
build  up  logic  functions.  Using  transmission  gates  means 
both  polarities  of  all  control  signals  are  required-  The 
resulting  large  number  of  wires  required  to  route  these 
control  signals  can  become  very  area  consuming,  especially 
if   only    one   metal   layer   is   available. 

A  third  and  more  effective  solution  is  to  use  dynamic 
logic.  Figure  2.7  contains  three  different  implementations 
of  a  dynamic  three-  input  NAND  gate.  In  each,  the  output  is 
meaningful  (i.e.  represents  the  value  of  the  boolean  expres- 
sion in1  in2  in3)  only  when  elk  is  high  and  elk  is  low.  The 
circuits  of  Figure  2.7  (a)  and  (b)  depend  on  the  pull-up  to 
pull-down  ratio  to  produce  the  proper  output.  As  with  the 
NMOS-like   style    of    design,       full      excursion    on   the    output   is 
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lost  and  there  is  steady  state  power  consumption  during  the 
evaluation  cycle.  The  circuit  in  Figure  2.7  (c)  is  prec- 
harged  when  elk  is  low  and  evaluation  of  the  inputs  takes 
place  when  elk  is  high.  This  configuration  allows  only  one 
change  of  the  output  from  1  to  0,  so  the  inputs  must  be 
stable  at  the  time  elk  goes  high.  A  change  of  one  of  the 
inputs  from  1  to  0  after  elk  has  gone  high  cannot  cause  the 
output  to  return  to  1. 

In   general  dynamic   CMOS  eliminates   the  redundancy   of 
static  CMOS  by  applying  all  inputs  to  one  type  of  device  and 
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Figure  2.7   Dynamic,  HAND  Gates   [Ref.  6]. 

a  control  signal  to  the  other  type  of  device.  The  most 
popular  dynamic  CMOS  logic  design  technique  is  domino  CMOS 
[Ref.  7],   illustrated  in  Figure  2.8   Here  the  output  is  the 


18 


logical  AND  of  the  boolean  function  (in1  in2  +  in3)  to  be 
implemented  and  a  control  (clock)  signal.  When  the  clock  is 
low,   the  circuit  is  precharged,   and  when  the  clock  is  high 
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Figure  2-8    Domino  CMOS  Structure  [Ref.  6]. 

evaluation  occurs.  With  a  common  clock  shared  by  all  the 
domino  gates  on  a  chip,  during  the  evaluation  cycle  the 
signals  ripple  through  the  chip  as  though  the  logic  were 
purely  static.  The  follow  on  inverter  insures  that  the 
output  of  each  gate  is  low  when  evaluation  begins.  This 
prevents  the  outputs  of  all  gates  from  changing  unless 
driven  lew  by  the  inputs.  Domino  CMOS  is  not  always  the 
answer  though.  If  the  logic  of  Figure  2. 9  were  implemented 
in  domino  CMOS  it  would  be  more  area  consuming  than  the  same 
circuit  implemented  in   static  CMOS.    Dynamic  CMOS   is  more 
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area  consuming  in  this  case  because  these  are  simple  gates 
with  only  a  few  inputs.  Each  NCR  gate  if  implemented  stati- 
cally would  need  two  n-channel  devices  and  two  p-channel 
devices.  If  implemented  dynamically,  each  NOR  gate  requires 
three  transistors  of  one  type  (one  for  each  input  and  one 
for  the  control  signal)  and  one  transistor  of  the  other  type 
(for  the  control  signal  again) .  The  number  of  transistors 
needed  remains  the  same  but  the  dynamic  logic  requires  the 
designer  to  keep  three  inputs  electrically  isolated  instead 
of  just  two.  And  if  the  dynamic  design  technique  is  domino, 
six  additional  inverters  will  be  needed.  As  can  be  seen  in 
Figure  2.4,  in  CMOS  a  NOR  gate  can  be  constructed  from  just 
one  stage.  Adding  the  follow-on  inverter  of  the  domino 
design  results  in  an  OR  gate.  Thus  a  second  inverter  is 
required  to  return  the  logic  to  that  of  a  NOR  gate. 


1 

L>^ 

Figure  2.9   Circuit  Difficult  to  laplement  in  Domino  CMOS. 


C.   CMOS  IMPLEMENTATION  TECHNOLOGIES 

One  of  the  principal  issues  in  the  design  of  a  process 
to  implement  CMOS  digital  circuits  in  silicon  is  how  to 
isolate  the  two  types  of  devices.  This  can  be  accomplished 
by  using  a  completely  insulating  substrate  or  through  a  more 
complex  fabrication  process. 
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1 .  CMOS-SOS 

The  only  process  currently  offered  by  Metal-Oxide 
Semiconductor  Implementation  Service  (MOSIS)  which  uses  an 
electrically  insulating  substrate  is  Silicon  on  Sapphire 
(SOS) .  In  this  technology  the  n-channel  and  p-channel  tran- 
sistors are  formed  on  silicon  islands  left  after  etching  an 
epitaxial  layer   of   silicon   on   a   sapphire    (Al^O^)    substrate. 

2 .  CMOS-Bulk 

The  other  CMCS  processes  offered  by  MOSIS  all  use 
CMOS-Bulk  p-well  technology.  The  p-well  processes  differ  in 
the  number  of  layers  of  metal  interconnections  (1  or  2)  and 
the  presence  or  absence  of  capacitors.  In  CMOS-Bulk  p-well 
(n-well)  the  substrate  is  n-doped  (p-doped)  and  the 
p-channel  (n-channel)  devices  are  in  this  substrate.  To 
isolate  the  n-channel  (p-channel)  devices  from  the  substrate 
a  heavily  doped  p-well  (n-well)  is  first  placed  to  act  as 
the  back  gate.  The  heavy  doping  of  the  p-well  (n-well) 
degrades  the  performance  of  the  n-channel  (p-channel)  device 
while  the  p-channel  (n-channel)  device  is  optimized.  In 
p-well  CMOS,  though  the  mobility  of  electrons  in  the 
n-channel  device  still  exceeds  that  of  the  holes  in  the 
p-channel  device,  the  performance  difference  of  the  transis- 
tors is  ninimized.  The  more  uniform  performance  of  the  two 
transistor  types  makes  the  p-well  process  appropriate  for 
CMOS    random   logic. 

Figures  2.10  and  2.11  represent  the  top  and  side 
views  of  the  steps  of  the  CMOS-pw  process  for  the  production 
of  an  inverter.  These  steps  are:  (1)  starting  with  an 
n-type    substrate   the      p-well   is    patterned,        (2)  The    active 

areas  in  the  p-well  and  on  the  substrate  are  established, 
(3)  the  polysilicon  is  patterned,  (4)  the  two  ion  implant 
masks    are      placed    (the      N+    mask      is    simply      the    photographic 
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negative   of    the    P+    mask)     ,       (5)       contact    cuts   are    made,       and 
(6)    the    metal   is    placed. 

a.      Latchup    in   CMOS-pw 

One  of  the  main  problems  associated  with 
CMOS-Eulk,  both  p-well  and  n-well  is  latchup.  Basically 
latchup  involves  generation  of  a  short  circuit  between  Vdd 
and  GND,  and  can  result  in  the  complete  destruction  of  a 
chip.  Many  researchers  have  tried  to  formally  define  the 
conditions  [Eef.  8]  that  cause  latchup  to  occur.  This  task 
is  extremely  complex  because  the  phenomenon  is  so  dependent 
on  layout,  which  is  unique  to  each  chip  design.  Though  a 
fully  quantitative  analysis  of  latchup  is  still  not  avail- 
able, a  qualitative  analysis  will  show  what  happens  on  the 
chip    when   latchup   occurs. 

Looking  at  the  side  view  of  an  inverter  in 
Figure  2.12,  parasitic  bipolar  transistors  can  be  seen.  The 
base  of  the  npn  transistor  is  the  p-well  and  the  base  of  the 
pnp   transistor      is    the   n-doped    substrate.  These    parasitic 

transistors  are  connected  as  shewn  in  Figure  2.13  .  If  the 
output  of  the  gates  goes  below  GND  by  a  value  equal  to  the 
threshold  of  the  npn  transistor,  its  emitter  starts  to 
inject  current  (electrons)  intc  the  base  (p-well)  and  the 
resultant  collector  current  flows  to  the  Vdd  node.  If  the 
resistance  between  the  Vdd  ncde  and  the  source  of  the 
pull-up  p-channel  HO S  transistor,  R1,  is  large  enough,  the 
voltage  drop  across  E1  will  exceed  the  threshold  of  the  pnp 
transistor.  The  collector  current  (holes)  of  the  pnp  device 
flows  to  the  GND  node.  If  the  resistance  between  the  GND 
node  and  the  source  of  the  pull-down  n-channel  MOS  tran- 
sistor, R2,  is  great  enough,  the  resultant  voltage  drop 
across  R2  will  increase  the  base  current  in  the  npn  tran- 
sistor.     As   is    evident,    there   is   positive    feedback. 
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Figure   2.10        P-Well  Process,    Top   View     [Ref.    6]. 
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The  only  way  to  stop  this  destructive  process  once  it  has 
started  is  to  disconnect  Vdd  or  GND.  Prevention  of  latchup 
must    te    designed   in. 
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Figure    2.12        Bipolar  Transistcrs   in   CMOS-Bulk      [Bef.    6]. 


Figure    2.13        The   Latchup   Circuit      [ Ref .    6  J. 

The    MOSIS  CMOS-Bulk   p-well      design   rules    include 
features        for      the        specific      purpose        of      reducing        the 
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probability  of  latchup.  The  ninimum  separation  rules  for 
p-wells  and  P+  doped  active  areas  exist  for  this  purpose. 
Their  aim  is  to  reduce  the  gain  of  the  parasitic  bipolar 
transistors,  thus  requiring  a  larger  noise  spike  of  longer 
duration  to  start  the  latchup  sequence.  A  frequently  used 
technique  is  the  grounding  of  the  p-well  as  illustrated  in 
Pigure  2-14  .  Here  the  effect  cf  the  P+  doped  area  covering 
half  of  the  contact  cut  for  the  ground  bus  is  to  reduce  the 
resistance  E2  in  Figure  2.13  .  Another  practice  is  to  place 
a  small  capacitor  across  the  Vdd  and  GND  pins  of  CMOS-Bulk 
chips.  To  provide  capacitive  filtering  of  noise  spikes  on 
the  chip,  Vdd  and  GND  busses  are  frequently  run  close 
together.  Also,  Vdd  input  pads  are  designed  to  provide 
capacitance   between    Vdd    and    GND. 
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3.       Iwin-tub    CMOS 

This      process,      also      called      twin-well,       uses      both 
n-wells      and      p-wells     on      a         high      resistivity      N-      or     P- 
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substrate,  or  in  an  epitaxial  layer  of  silicon  on  a  P+  or  N+ 
wafer.  Since  the  well  doping  does  not  have  to  overcome  the 
substrate  doping,  both  the  n-channel  transistors  in  the 
p-well  and  the  p-channel  transistors  in  the  n-well  can  be 
optimized.  Domino   CMOS      is    enhanced      by    the      use    of      this 

process  since  the  optimized  n-channel  devices  can  speed  up 
the  complex  boolean  expression  evaluation  and  the  optimized 
p-channel  devices  can  speed  up  the  signal  drive  between 
stages    (thereby    reducing    the   effect    of  a   given   f anout) . 

D.       CMOS    TECHNOLOGY    SELECTION 

The  CMOS  implementation  technologies  available  from 
MOSIS  are  CMOS-Bulk  p-well  with  one  metal  layer,  CMOS-Bulk 
p-well  with  two  metal  layers,  CMOS-Bulk  p-well  with  two 
metal  layers  and  capacitors  (for  analog  circuits)  and 
CMOS-SOS. 

The  advantages  of  CMOS-Bulk  are:  (1)  very  good  noise 
margin,  (2)  faster  than  NMOS,  and  (3)  a  proven  reliable 
fabrication  process.  Its  disadvantages  are:  (1)  latchup 
susceptibility,  (2)  use  of  p-well  guard  rings  is  needed  if 
radiation  hardening  is  desired,  (3)  lower  circuit  density 
than  NMOS  or  CMOS-SOS,  and  (4)  more  complex  design  rules 
than    either   NMOS   or    CMOS-SOS. 

The  advantages  of  CMOS-SOS  are:  (1)  faster  than  NMOS  or 
CMOS-Bulk,  (2)  very  good  noise  margin,  (3)  intrinsically 
radiation  hardened,  and  (4)  no  latchup.  Its  disadvantages 
are:  (1)  expensive  fabrication  process  due  to  the  sapphire, 
(2)  sapphire  variability  reduces  the  reliability  of  the 
fabrication  process,  (3)  thermal  mismatch  between  the 
sapphire  and  silicon  limits  the  carrier  mobility,  and  (4)  it 
is  not  a  viable  technology  for  dynamic  memory  due  to  back 
channel   leakage. 
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CMOS-Bulk  p-well  was  selected  as  the  implementation 
process  for  the  adder  for  the  following  reasons.  First, 
technology  files  for  this  process  were  available  at  the 
Naval  Postgraduate  School  (NPS)  enabling  the  use  of  extant 
computer  aided  design  (CAD)  tools.  Second,  since  this  would 
be  the  first  CMOS  VLSI  design  at  NPS,  utilizing  the  most 
reliable  process  is  prudent  to  prevent  design  problems  from 
being   clouded   by   implementation   process   problems. 
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III.  DESIGN  TOOLS 

To  employ  the  Mead-Conway  design  methodology  on  a  large 
scale  design,  three  computer  aided  design  (CAD)  tools  are 
needed-  A  layout  design  editor  for  viewing  the  circuits  as 
they  are  created  is  the  first  tcol  required.  Next,  a  design 
rule  checker  is  necessary  to  confirm  that  all  the  design 
rules  for  the  specified  technology  have  been  adhered  to. 
Though  not  a  complex  task,  the  large  number  of  checks  that 
must  be  made  for  even  a  modest  design  makes  manual  design 
rule  checking  highly  error  prone.  Finally,  a  circuit  simu- 
lator is  needed  to  verify  that  the  circuit  as  designed 
provides  the  proper  logical  output.  In  the  design  of  the 
sixteen-tit  pipelined  adder,  the  Caesar  layout  editor 
[Eef.  4],  the  Lyra  design  rule  checker  [ Ref .  10],  and  C. 
Terman's  ENL  circuit  simulator  [Ref.  11]  were  employed. 

A.   CAESAE 

Caesar  is  a  generic  layout  editor.  It  is  not  designed 
for   any      particular    VLSI    implementation    technology.  It   is 

not  even  limited  to  designing  integrated  circuits.  Caesar 
is  a  graphics  layout  editor  for  the  creation  and  manipula- 
tion of  rectangles  where  the  user  specifies  the  color,  size, 
and  placement.  It  is  through  the  user  specified  technology 
file  that  the  rectangles  of  color  take  on  meaning.  At  the 
Naval  Postgraduate  School  (NPS)  there  are  two  technology 
files  available  for  use  with  Caesar.  One  is  for  N-doped 
metal  oxide  semiconductors  (NBOS)  and  the  other  is  for 
complementary  metal  oxide  semiconductors  utilizing  a  P-doped 
well     (CMCS-pw)  . 
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Caesar  works  with  files  cf  its  own  special  format. 
These  file  are  indicated  by  an  appended  file  type  of  ca(i.e. 
xxxx.ca).  On  command  Caesar  will  generate  a  Caltech 
Intermediate  Format  (CIF)  file  cf  the  same  layout.  Again  it 
is  the  technology  file  which  tells  Caesar  which  CIF  layer 
labels  to  attach  to  the  colored  rectangles. 

At  NPS,  Caesar  is  set  up  to  take  commands  from  any 
terminal  where  the  execution  of  the  Caesar  program  is  initi- 
ated (usually  the  ADM-3a  console  adjacent  to  the  color 
graphics  display  unit)  and  from  a  four-button  puck  on  a 
graphics  tablet  attached  to  the  color  display  device. 
Caesar  displays  its  graphics  results  on  an  AED  767  color 
monitor  and  displays  its  menus,  messages,  and  prompts  on  the 
command  console.  Detailed  information  on  the  installation 
and  operation  of  Caesar  at  NPS  can  be  found  in  Reference  4 
and  Reference  2. 

Caesar  is  an  interactive  CAE  tool.  The  results  of  any 
command  are  rapidly  displayed  on  the  AED  767.  The  results 
of  a  ccmmand  may  be  undone  (u)  cr  repeated  (.)  with  a  single 
stroke  of  the  specified  key  on  the  command  console.  While 
running  Caesar,  a  user  may  also  call  upon  the  design  rule 
checker,  Lyra,  to  check  the  area  inside  and  within  three 
Caesar  units*  of  the  current  box  for  design  rule  violations. 
This  interactive  use  of  the  layout  graphics  display  and  the 
design  rule  checker  helps  to  insure  that  there  will  not  be 
any  design  rule  forced  changes  late  in  the  design  cycle  when 
changes  are  much  more  time  consuming.  With  Caesar's  level 
of  interaction  with  the  designer,  the  design  loop  consisting 
of  (1)  issue  commands  to  perturb  existing  circuit,  (2) 
visual  inspection  to  verify  command's  generation  of  desired 


♦A  Caesar  design  is  layed  out  on  a  grid  of  Caesar  units. 
These  units  do  not  represent  any  specific  length.  When 
creating  a  CIF  file  from  a  Caesar  file  the  desired  length  of 
a  Caesar  unit  is  specified. 
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results,  and  (3)  design  rule  checking  of  new  circuit,  can  be 
rapidly  completed. 

Caesar  is  a  hierarchical  design  tool.  With  Caesar, 
circuits  can  be  created  by  piecing  together  cells  (other 
files  of  type  .ca)  which  in  turn  may  be  made  up  of  other 
sub-cells.  Theoretically,  there  is  no  limit  to  the  number 
of  levels  in  the  hierarchy.  Net  only  can  cells  (sub-cells, 
etc.)  be  called  upon  to  fill  locations  in  a  circuit,  if  they 
need  to  be  modified  to  function  properly,  Caesar  provides  a 
subedit  mode  to  facilitate  editing  of  layouts  one  level 
below  the  current  editing  level.  Care  must  be  taken  when 
this  subedit  feature  is  used  since  the  changes  made  to  the 
cell  are  global.  Everywhere  the  given  cell  is  used  on  the 
chip,  the  newly  edited  version  will  appear. 

B.   LIRA 

like  Caesar,  Lyra  is  a  generic  design  rule  checker. 
When  Lyra  is  invoked  from  within  Caesar,  the  actual  program 
executed  to  check  for  design  rule  errors  depends  on  the 
technology  file  indicated  in  the  header  of  the  Caesar  file 
being  edited.  After  running,  Lyra  sends  a  message  to  the 
command  console  indicating  the  number  of  errors  found.  On 
the  graphics  display  Lyra  paints  the  exact  location  of  each 
error  and  labels  each  error  with  the  design  rule  violated. 
The  error  label  consists  of  abbreviations  for  the  layers 
involved,  followed  by  an  underscore,  followed  by  an  abbrevi- 
ation for  the  type  of  violation  detected.  Table  1  lists  the 
abbreviations    used    by  Lyra   for    CMOS-pw. 

The  winter  1983  distribution  of  the  University  of 
California  at  Berkeley  (UC3)  CAT  tools  included  two  versions 
of  Lyra.  One  for  the  Mead-Conway  NMOS  design  rules  and  the 
other  for  the  Jet  Propulsion  Laboratory's  (JPL)  five-micron 
feature   size   CMOS-pw      design    rules.         Since    MOSIS      no   longer 
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TABLE    1 
Lyra   Error    Abbreviations 


Layer 

Abbreviation 

polysllicon 

P                * 

metal 

m               s 

p-well 

w                  X 

n+    diffusion 

d 

cut 

c 

p+    diffusion 

P 

Error 
minimum   width 
minimum   separation 
malformed    transistor 


supports  fabrication  of  the  JE1  CMOS-pw  process,  design 
rules  for  the  MOSIS  supported  three-micron  CMOS-pw  process 
were  obtained.  Professor        Marco  Annatarone  at 

Carnegie-Mellon  University  (CMO) generated  the  listing  of  the 
three-micron  CMOS-pw  design  rules  compatible  with  Lyra  and 
has  provided  NPS  with  a  copy.  To  generate  executable  code 
from  the  prototype  Lyra  program  and  imbed  the  specific 
process  design  rules,  the  program  rulec  (see  Appendix  B)  is 
run    with  the  design    rule   list   file   as  its   argument. 

Now,  when  Lyra  is  invoked  from  Caesar  while  editing  a 
CMOS-pw  technology  circuit,  the  three-micron  minimum  feature 
size  CMOS-pw  design  rules  are  applied.  This  version  of  Lyra 
does  not  check  for  exceeding  any  maximum  dimensions.  The 
only  maximum  size  design  rule  in  this  technology  is  for 
contact  cuts,  which  may  not  exceed  3  microns  by  8  microns. 
Avoidance  of  improper  contact  cuts  can  be  accomplished  by 
utilizing  Caesar's  hierarchical  nature.  Contact  cuts  of  all 
needed  sizes  and  types  are  generated  once  and  saved  to  be 
inserted  as   cells    wherever   needed. 

C.   SIMULATION 

Once  a  circuit  layout  has  completed  this  initial  design 
loop,  it  matches  the  designer's  conception  of  how  it  should 
appear  and  is  free  of  design  rule  violations.  The  perform- 
ance of  the  given  circuit,  though,  remains  uncertain.  To 
simulate  the  performance  of  the  design,  programs  such  as 
SPICE  [Ref.    11]    and    ENL   [fief.     11]   are   used. 
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1 .        SPICE 

SPICE  is  an  important  simulation  tool  in  the  design 
of  high  speed  CMOS  digital  and  analog  circuits.  With  its 
detailed  device  modeling,  SPICE  can  provide  accurate 
predictions  of  performance  once  the  device  parameters  of  the 
implementation   technology      are    known.  SPICE   provides      the 

logical  output  of  a  circuit  based  upon  the  inputs  and 
describes  the  transient  behavior  of  the  circuit  as  it 
changes  to  the  new  logical  output.  Thus  SPICE  enables  a 
designer  to   optimize    transistor   dimensions    for  speed. 

Unfortunately,  the  version  of  SPICE  currently  avail- 
able en  both  the  Vax  11-780  and  the  IBM  3033  at  NPS  (version 
2G6)  fails  when  the  parameters  of  the  devices  fabricated  by 
the  MCSIS  three-micron  CMOS-pw  trocess  are  used.  With  these 
parameters   the    transient   behavior   solutions   do  not   converge. 

Engineers  at  CMU,  UCB,  and  the  University  of 
Washington  (UW)  are  currently  employing  an  experimental 
version  of  SPICE  {version  2X. x  developed  at  UCB)  which  is 
successful  simulating  with  the  three-micron  CMOS-pw  device 
parameters.  This  version,  however,  has  other  bugs  and  is 
therefore      not      available   for      general      distribution.  The 

changes  to  SPICE  2G6  that  enable  SPICE  2X.x  to  simulate  the 
three-micron  CMOS-pw  devices  will  be  incorporated  into  the 
next      distribution      of      SPICE       {version      2G7) .  The      Naval 

Postgraduate  School  is  in  the  gueue  of  institutions  to 
receive   SPICE   2G7    once   it   is   ready. 

In  order  to  run  a  SPICE  simulation  of  a  CMOS  circuit 
designed  using  Caesar,  the  following  steps  should  be 
executed.  First,  the  labeling  feature  of  Caesar  is  used  to 
place  labels  on  the  electrical  nodes  of  interest  in  the 
circuit  (Vdd,  GND,  input,  output,  etc.).  Second,  the  Caesar 
command 

:    cif    100   -p 
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is  issued  to  generate  the  baseDame. cif  file.  The  parameter 
100  indicates  a  scale  of  100  centimicrons  per  Caesar  unit5 
and  must  be  specified  unless  the  default  value  of  200 
centimicrons  per  Caesar  unit  is  desired.  The  -p  option 
causes  entries  to  be  made  in  the  basename.cif  file  for  the 
labels  assigned-  Third,  after  exiting  Caesar  and  returning 
to  Unix,  the  circuit  extractor  Mextra  [Eef.  10]  is  invoked 
using  the  command 

%    mextra  basename 

to  create  the  file  basename. sim.  To  modify  the  basename. sim 
file  to  a  SPICE  file  (basena me . spice) ,  the  program  sim2spice 
[Ref.  11]  is  used-  The  basenane. spice  file  contains  a  list 
of  transistors  and  capacitors  in  the  circuit  in  a  SPICE 
compatible  format. 

The  basena me. spice  file  must  be  edited  to  add  the 
model  parameters  for  the  transistors,  to  specify  the  wave- 
forms of  the  input (s) ,  to  specify  the  type  of  analysis  to  be 
performed  (usually  transient  analysis)  and  to  specify  the 
output  to  be  produced  (tables,  graphs,  etc.).  The  Spice 
User's  Manual  [Eef.  11]  contains  the  formats  of  these  addi- 
tions to  basename. spice.  Best  case  and  worst  case  device 
model  parameters  for  the  MOSIS  three-micron  CMOS-pw  process 
as  compiled  by  Dr.  M  Annaratone  of  CHO  and  Dr.  L.  Glasser 
of  MIT  are  found  in  Appendix  A. 

2.   EN  I 

ENL  is  a  timing  and  logic  simulator  for  digital  MOS 
circuits.  It  is  an  event  driven  simulator  which  uses  a 
resistance-capacitance  model  of  a  circuit  to  estimate  node 
transition      times   and      to      estimate      the    effects      of      charge 


5Since  the  minimum  dimensions  for  the  3-micron  CMOS-pw 
process  are  specified  in  microns  instead  of  lambda,  CMOS-pw 
circuits  are  usually  designed  or  Caesar  using  one  micron  per 
Caesar    unit. 

34 


sharing.6  After  input  values  have  been  assigned  by  the  user, 
RNL  calculates  the  effects  of  those  inputs  by  repeating  the 
following  operations  until  there  are  no  further  node  value 
changes:  (1)  when  a  node  is  added  to  the  network  due  to  a 
transistor  being  turned  on,  the  charge  sharing  implications 
of  the  new  node's  capacitance  and  logic  state  on  each  of  its 
electrical  neighbors  is  computed,  (2)  for  each  node  that 
might  be  affected,  Vthev  and  Ethev  (the  parameters  of  the 
Thevenin  equivalent  circuit)  are  calculated  and  the  new 
logic  state  is  determined  from  Vthev  (O.OVdd  to  0.3Vdd  = 
logic  0,  0.8Vdd  to  I.OVdd  =  logic  1,  logic  X  otherwise),  (3) 
if  the  node  has  changed  state,  the  transition  time  is  calcu- 
lated using  the  node's  capacitance,  and  (4)  any  changes  are 
propagated      to    other      nodes.  Details      of    the      computation 

methods    used   by    RNL    can    be      found    in   the    RNL    Version    U.2(0W) 
User's   Guide  [Ref.    11].  More  important    to   the      user    is   an 

understanding      of      what     information      RNL      keeps,         what      it 
discards,    and  how    it   decides    what    to   do  next. 

Basic  to  the  operation  of  RNL  is  the  idea  of  an 
event.  The  three  elements  of  an  RNL  event  are:  (1)  a  node 
in  the  network,  (2)  a  new  logic  state  for  the  node,  and  (3) 
the  time  when  the  node  value  changes  to  the  new  logic  state. 
RNL  maintains  a  list  of  events,  sorted  by  time,  that  tells 
what  processing  remains  to  be  done.  When  the  user  changes 
an  input,  an  event  is  added  to  the  list.  RNL  sequentially 
processes  the  next  event  on  the  list,  stopping  when  (1)  the 
list  is  empty,  (2)  a  node  the  user  is  tracing  changes  value, 
or  (3)  when  the  specified  simulation  time  interval  has 
elapsed.  To  process  an  event,  5NL  removes  it  from  the  list, 
changes    the   node's    state   to    reflect    its    new    value,    and    then 


6Charge  sharing  refers  to  the  capacitive  effects  that 
happen  when  two  or  more  previously  unconnected  nodes,  each 
having  seme  charge  and  capacitance,  become  connected  by  a 
resistor    (transistor   turning    on).  . 
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calculates  any  new  events  resulting  from  the  node's  new 
value. 

In  calculating  new  events,  first  all  nodes  that 
might  be  affected  by  the  change  are  found  and  marked.  This 
includes  the  source  and  drain  cf  all  transistors  for  which 
the  current  node  is  the  gate  and  all  nodes  connected  to 
these  nodes  through  turned  on  transistors.  The  search 
through  the  network  stops  when  a  non-conducting  transistor 
or  an  input  is  reached.  For  each  marked  node,  two  calcula- 
tions are  made.  First,  a  charge  sharing  calculation  is 
performed  to  model  changes  of  state  due  to  the  charging  and 
discharging  of  node  capacitances.  Second,  a  final  value 
calculation  is  done  to  determine  the  node's  ultimate  logical 
state. 

A  given  node  can  have  only  two  events  pending:  (1)  a 
charge  sharing  event  describing  an  immediate  change  in  the 
node's  state  due  to  charge  redistribution  among  the  nodes  on 
the  connection  list,  and  (2)  a  final  value  event  describing 
the  final,  driven  state  of  the  node.  RNL  observes  the 
following  rules  for  processing  events:  (1)  when  a  new  charge 
sharing  event  is  scheduled,  throw  away  all  previously 
pending  events  for  the  node,  and  (2)  when  a  new  final  value 
event  is  calculated,  it  will  be  ignored  if  (a)  there  is  a 
pending  final  event  for  the  same  value  which  is  scheduled  to 
occur  sooner,  (b)  there  is  a  pending  charge  sharing  event 
for  the  same  value  as  the  new  final  event,  or  (c)  there  is 
no  charge  sharing  event  and  the  new  final  value  event  is  the 
same  as  the  node's  current  value.  These  rules  are  based  on 
the  assumption  that  the  event  that  was  last  calculated 
reflects  the  latest  configuration  of  the  network  and  there- 
fore should  override  events  calculated  earlier.  Charge 
sharing  events  discard  any  pending  final  value  events 
because  any  charge  sharing  calculation  is  immediately 
followed  by  a  new  final  value  calculation. 
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These  event  rules,  however,  sometimes  lead  RNL  to 
generate  incorrect  results.  This  is  especially  true  of 
signal  driven  circuits  (circuits  where  inputs  are  applied  to 
the  source  and  drain  of  a  transistor  as  well  as  its  gate) 
and  circuits  that  depend  on  the  analog  properties  of  the 
devices  to  predict  the  behavior  of  the  circuit.  For 
example,  consider  the  first  exclusive  OR    gate  design  for  the 
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Figure  3.1    CMOS  Exclusive  OB  [ Ref .  6  ]- 

pipelined  adder  in  Figure  3. 1  This  design  has  proven  to 
function  correctly  at  CMU,  however,  the  RNL  simulation  shows 
this  circuit  failing. 

Starting  in  a  state  where  A=0,  B=1,  and  out=1, 
assume  that  the  input  A  then  transitions  to  1.  Initially 
Q1 ,  Q3 ,  0.4,  and  Q6  are  on.  When  input  A  goes  high,  Q3  is 
turned  off  (no  events  generated)  and  Q2  is  turned  on,  gener- 
ating a   charge  sharing  event  and  a   final  value   event  for 
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Abar  resulting  in  Abar  going  low.  When  Abar  goes  low,  the 
still  turned  on  Q6  is  now  trying  to  drive  the  output  node 
low  and  the  still  turned  on  Q4  (RNL  recognizes  that  it  takes 
a  finite  amount  of  time  for  Q4  to  turn  off  but  does  not 
recognize  that  n-channel  transistors  do  not  conduct  high 
voltages  well)  is  still  trying  to  drive  the  output  node 
high.  The  result  is  an  output  of  X,  the  undefined  state. 
Next,  Q4  is  turned  off.  Since  turning  off  Q4  adds  no  new 
nodes  to  the  network,  the  event  list  is  empty  and  the  output 
remains  at  X.  The  primary  difficulty  RNL  has  with  this 
circuit  centers  around  the  fact  that  the  output  node  is 
controlled  by  two  nodes  that  can  change  at  different  times. 
As  a  result,  a  charge  sharing  event  due  to  one  input  can 
eliminate  a  final  value  event  of  the  other,  with  that  final 
value  event  being  the  force  which  determines  the  circuit's 
actual   behavior. 

The  circuit  cf  Figure  3.2  is  a  proven  latch  design 
which  also  fails  in  BNL  simulation.  In  Figure  3.2  the  frac- 
tions next  to  the  transistors  represent  the  length  to  width 
ratios  of  the  devices.  This  circuit  is  dependent  on  these 
ratios  fcr  proper  operation.  These  ratios  insure  that  the 
gain  of  the  input  signal  on  the  gates  of  Q5  and  Q6  is 
greater  than  the  gain  of  the  feedback  signal  to  the  same 
gates.  RNI  does  not  recognize  the  difference  in  these  gains 
to  be  sufficient  to  cause  the  gates  of  Q5  and  Q6  to  be  at 
either  logical  1  or  0  when  the  input  signal  is  the  opposite 
of  the  feedback  signal.  As  a  result,  the  circuit  becomes 
locked  up  at  X.  Because  of  RNI's  difficulty  with  these  two 
circuits,  other  designs  were  employed  in  the  final  adder 
(see    chapter   5)     to    facilitate   testing   of   the   overall   design. 

To  use  RNL  as  installed  at  NPS,  the  following  steps 
should    be   followed.  First    latel    the   circuit      and    generate 

basename.cif  as  before.  Again  the  program  Mextra  is  used  to 
extract    the   circuit,         this    time   with    the      -o   option     (Mextra 
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Figure  3-2   CMOS  Latch  Design   [ Ref .  6]. 

basenaie  -o)  .  The  -o  option  causes  Mextra  not  to  compute 
capacitances.  A  follow  on  program  in  this  sequence,  Presim, 
performs  this  computation  with  greater  accuracy.  It  should 
be  noted  that  there  are  three  different  circuit  extraction 
programs,  each  named  Mextra.  There  is  the  MIT  version,  the 
DCB  version  and  the  Ufi  modified  UCB  version.  The  next  tool 
to  be  used  in  the  seguence,  Presim,  can  accept  the  output 
format  of  the  MIT  version  and  the  UW  modified  UCB  version. 
At  NPS,  the  UCB  version  is  installed  and  was  used.  The  MIT 
and  UI  modified  DCB  versions  differ  in  the  order  of  the 
parameters  in  a  transistor  specification.  Professor 
Annaratone  at  CMU  developed  a  program,  cformat,  to  change  a 
•  sim  file  generated  by  the  UCB  version  to  the  MIT  format. 
However,  cformat  does  not  work  if  the  -o  option  is  used  with 
Mextra.  To  avoid  a  loss  of  accuracy,  the  .sim  file  can 
manually  be   changed  to   the  Ufl   modified  UCB   format.    The 
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first  step  in  this  format  change  is  to  use  the  71  text 
editor  to  add  "format:  UCB"  to  the  header  line  of  base- 
name. sim.  The  other  change  that  needs  to  be  made  is  to 
change  the  labels  for  the  n-channel  transistors  from  "n"  to 
"e".  Using  the  EX  editor,  the  following  steps  accomplish 
this : 

%   e  basename.sim  -    invokes   the   editor 

:    g/  n/s//e/g  -    make    global    change 

for  all   n   as   first   char 
in   a   line,    change    to    e 

:    w  -    write   back   edited    file 

:    g  -    exit    editor 

The  next  step  is  to  create  a  binary  file  for  RNL 
from  basename.sim  using  Presim.  This  is  done  by  issuing  the 
command : 

%   presim   basename.sim   basename   config 

Basename.sim  is  the  edited  .sim  file  and  basename  is  the 
file  into  which  presim  writes  its  binary  output.  Config  is 
the  calibration  file  used  to  select  other  than  default 
values  for  the  circuit  element  capacitance  and  resistance. 
A  copy  of  the  presim  user's  guide  from  the  UW/NflC  VLSI 
Consortium  release  2.0  and  the  calibration  file  used  in 
simulating      the    adder     are   contained      in      Appendix   C.  The 

values  used  in  the  calibration  file  are  taken  from  the  MOSIS 
supplied   electrical    parameters. 

The   final   step   is   to   run      RNL   itself.         This    is   done 
by    entering    one    of    the   following   two    Unix    commands: 
%   rnl  or 

%   rnl   cmdfile 

where  cmdfile  is  the  name  of  a  file  containing  a  seguence  of 
RNL  commands.  Entering  the  first  Unix  command  will  cause 
RNL      to      take        its      commands      directly      from        the      console 
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interactively.  If  the  second  Onix  command  is  used,  speci- 
fying a  command  file,  RNL  first  executes  all  the  commands  in 
cmdfile  and  upon  completion,  starts  taking  commands  from  the 
console.  In  either  case,  RNL  should  be  given  the  following 
commands : 

(load  "uystd.  1") 

(load  "uwsim.  1") 

(read- network  "has  ename") 

where  basename  is  the  file  generated  by  presim.  The  first 
two  commands  load  RNL  with  several  macros  which  simplify 
user  interfacing  with  RNL. 

The  user  interface  with  RNL  is  a  LISP  interpreter. 
The  interpreter  continuously  executes  the  loop:  (1)  read  a 
command,  (2)  evaluate  the  command  and  perform  the  specified 
actions,  and  (3)  print  the  result.  There  are  two  formats 
for  specifying  commands  to  this  loop.  The  first  is: 
(function  argument  argument  ...  argument) 

Here  the  parentheses  delimit  the  command  and  spaces  separate 
the  elements.  The  interpreter  reads  the  entire  command,  up 
to  the  closing  parenthesis,  then  the  first  element  is  inter- 
preted as  a  function  and  all  the  others  as  arguments.  The 
arguments  may  be  of  the  same  command  form,  (function  arg  arg 
...  arg).  If  the  following  command  were  issued  to  RNL, 
(*   12   (+22)     (/   14   7  )) 

RNL  would  respond  by  typing  96  (12*4*2).  The  other  format 
for  commands  to  RNL  is 

(function  '  (argument  argument  ...  argument)) 

where  the  "  f  "  indicates  the  quote  special  form  which  keeps 
its  argument  from   being  evaluated.    For  example,    (+  2  3) 
evaluates  to  5,  but  f  (+  2  3)   is  a  string  of  three  elements. 
When  this  second  RNL  command  format  is  not  used  to  represent 
an  argument  of  another  command  (i.e.  is  not  contained  within 
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the  parentheses  of  another  command) ,  it  may  be  written  in 
the  more  natural  form: 

function  argument  argument  ....  <newline> 

Tutorials  on  RNL  are  contained  in  the  University  of 
Washington/Northwest  VLSI  Consortium's  VLSI  Design  Tools 
Reference  Manual  [Ref.  11]-  There  are  two  points  concerning 
the  aextra,  Presim,  RNL  simulation  cycle  a  user  should  be 
aware  of  that  are  not  brought  out  in  the  documentation.  The 
first  concerns  the  use  of  vectors  in  RNL  commands.  As 
evidenced  in  the  tutorials  of  Reference  11  and  the  adder 
Simula  lion  results  in  Appendix  D,  vectors  can  be  used  to 
make  the  input  and  output  of  RNL  less  cumbersome  and 
verbose.  After  the  vector  has  been  defined,  a  user  will 
then  want  to  assign  values  to  it.  The  documentation  shows 
the  format  of  the  vector  value  assignment  command  to  be: 
(invec  ' (vecname  values)) 

However,  the  "values"  field  has  its  own  specific  format. 
The  first  character  should  be  a  0  or  a  1  indicating  positive 
and  negative  numbers,  respectively.  The  LISP  interpreter 
will  work  with  negative  numbers  but  RNL  will  not  accept 
negative  numbers  as  logical  inputs.  The  second  character  is 
a  letter  specifying  the  number  base  of  the  input  vector  (b 
for  binary,  h  for  hexadecimal) -  For  example,  to  assign  the 
binary  value  +101010  to  the  vector  vectone,  the  RNL  command 
would  be: 

(invec  »  (vectone  0b10  1010}) 

The  other  point  concerns  the  location  of  input 
labels  on  the  input  pads.  Ehen  the  entire  chip  is  being 
simulated,  the  input  labels  are  normally  placed  on  the  metal 
pads  where  the  off  chip  leads  are  attached.  Before  an  input 
signal  from  a  bonding  pad  reaches  the  interior  circuits  of  a 
chip   it  must   pass   through  a   resistor   in  an   overvoltage 
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protection  circuit.  In  the  extraction  and  simulation 
process  this  resistor  is  viewed  as  an  open  circuit. 
Therefore,  on  input  pads,  the  input  label  must  he  placed 
after  the  resistor  in  the  signal  path. 

With  Caesar,  Lyra,  and  ENL,  a  designer  at  NPS  has 
the  requisite  CAD  tools  for  the  complete  logical  circuit 
design  loop.  With  these  tools  circuits  that  are  free  of 
design  rule  errors  and  produce  the  desired  logical  results 
can  be  designed.  The  lack  of  SPICE  somewhat  restricts  the 
designer's  ability  to  optimize  speed,  but  there  are  several 
design  techniques  that  can  be  employed  to  design  chips  that 
run  fast.   These  will  be  covered  in  the  next  chapter. 


43 


IV.    DESIGN    OF    THE    ADDER 

As  stated  in  the  introduction,  the  primary  goals  of  the 
adder  design  are  to  maximize  throughput  and  to  provide  for 
testability.  The  adder  is  to  fce  a  pipelined  adder.  Every 
clock  cycle  it  should  accept  as  inputs  two  16-bit  addends 
(A  1 ,  the  least  significant  bit,  through  A16  and  31,  the 
least  significant  bit,  through  B16)  and  one  carry-in  (Cin) 
bit.  It  is  desired  to  produce  the  16-bit  sum  (S 1  ,the  least 
significant  bit,  through  S16)  and  the  carry-out  (Coat)  bit 
as  quickly  as  possible.  Both  the  number  of  clock  cycles 
from  input  of  the  addends  to  the  output  of  the  sum  and  the 
duration  of  each  clock  cycle  are  to  be  minimized.  A  secon- 
dary consideration  in  the  design  is  expandability.  An 
expandable  design  is  one  that  can  easily  be  extended  to 
produce  a  32-bit  or  64-bit  sum  utilizing  the  same  circuit 
structures.  In  this  chapter  the  logical  design  and  layout 
design  of  the  16-bit  adder  will  be  presented.  The  equations 
presented  in  this  chapter  are  taken  or  derived  from  equa- 
tions found  in  chapters  three  through  six  of  The  Logic  of 
Computer  Arithmetic  by  Flores  [ Eef .  12].  In  these  equations 
concatenation  implies  the  logical  AND,  the  symbol  +  implies 
the    logical   OR,    and    the   symbol    +   implies    the   logical   XOR. 

A.       LOGICAL    DESIGN 

In  considering  the  speed  spectrum  of  adders  from  a 
logical  standpoint,  at  the  fast  end  there  is  the  table 
look-up.  With  33  binary  inputs  and  17  outputs,  this  would 
require  an  address  space  of  233  17-bit  words.  With  current 
technology  this  is  not  feasible-  At  the  other  end  of  the 
spectrum    is    the    serial   adder.         On    clock    cycle    1    it    uses   A1, 
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B1,  and  Cin  to  produce  31  and  Clout  (carry  out  of  tit  one 
into  tit  2).  On  clock  cycle  2  it  uses  A2,  E2,  and  Clout  to 
generate  S2  and  C2out.  Here  16  clock  cycles  elapse  before 
the  sum  is  available.  An  adder  can  also  be  implemented  as  a 
ripple  carry  adder  where  the  duration  of  each  clock  pulse  is 
sufficient  to  allow  a  carry  into  the  sum  to  propagate  all 
the    way    through    to   a    carry   out.  In   the   case   of    the    16-bit 

adder,  this  would  require  a  clock  duration  at  least  sixteen 
times  the  length  of  the  gate  delay  of  the  one  bit  adder. 
The  middle  ground  belongs  to  the  carry  look- ahead  adder 
£Ref.  3].  In  carry  look-ahead  (CIA)  addition  the  carry  into 
each    bit   position,      C  (i) ,      is      generated   from   the    propagate, 

/>,,,=  A[t)QB{l)  (egn    4.1) 

<?(,-)=  A[,)B{i)  (ecn    4.2) 

P(i),  and  generate,  G  (i)  ,  primitives.  P  (i)  =1  implies  that 
a  carry  into  bit(i)  will- be  propagated  through  to  bit  (i+1). 
G(i)  =1  implies  that  A  (i)  and  B  (i)  will  provide  a  carry 
into    bit  (i+1)    of   the    sum,      regardless  of    the  contents    of    the 

<?(,->=  £(,-,)+£(,-,)/>(,-,)+   •••  +  CmP[i_1yP{7)P[1)        (egn    4.3) 

5(.)=  cl>)®p(.)  (e^n    u«4) 

less  significant  bits  of  A  and  E.  The  algorithm  for  the  CLA 
sum  generation  is  as  follows.  The  first  event  is  the  evalu- 
ation of  equations  4.1  and  4.2  to  generate  the  P (i)  and  G (i) 
primitives.  The  second  event  uses  the  P(i)  and  G (i)  primi- 
tives as  inputs  to  eguation  4.3  to  generate  the  C (i) 's.  The 
final  event  is  the  computation  of  the  S (i) •s  from  equation 
4.4  . 
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As  pointed  out  by  Flores  [Eef.  12]  and  by  Conradi  and 
Hauenstein  [Eef.  3],  there  are  several  logical  implementa- 
tions  of  carry  look  ahead  addition.  A  principal  task  of 
this  thesis  investigation  was  to  select  a  fast  logical 
design.  Without  the  circuit  simulator  Spice,  the  analysis 
of  each  design  considered  was  more  qualitative  than  quanti- 
tative. In  this  qualitative  analysis,  a  turned  on  tran- 
sistor is  considered  as  a  resistor  with  its  resistance 
proportional  to  its  length  and  inversely  proportional  to  its 
width.  All  gates  driven  by  such  a  turned  on  transistor  are 
considered  to  be  capacitive  loads  with  capacitance  propor- 
tional to  the  area  of  the  gate.  The  interconnect  wiring  is 
considered  to  add  both  parallel  capacitive  loading  and 
series  resistance  as  shown  in  Figure  4. 1 
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Figure  4.1    CHOS  Output  Loading  Model. 

From  this  model  it  is  obvious   that  the  amount  of  inter- 
connect  wiring  and   the   number  of  gates   driven   (fanout) 
should  be  minimized   to  minimize  the  output   transition  time 
when  the  positions   of  switches  SI    and  S2  of   Figure  4.1  are 
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reversed.  This   led      to    the      following      guidelines    in      the 

design   of   the    adder: 

1)  The  internal  logic  of  each  stage  should  be  accom- 
plished with  minimum  dimension  transistors  ,  3  microns 
x  4  microns  (length  x  width) .  This  leads  to  more 
compact  circuits  with  shorter  interconnections  and 
reduces   the   capacitive    load   on    the    preceding    stage. 

2)  Significantly    wider   transistors       (3-micron    x    9-micron) 
should  be   used      at   the   output    of   each      stage   where   the 
fanout  and  interconnect    leading   is   greater. 

3)  The  fanout  of  any  transistor  should  be  kept  to  less 
than   five. 

This  requires  a  more  complete  definition  of  fanout 
because  the  capacitive  loading  of  a  gate  depends  on  its 
area.  A  3-micron  x  4-micron  transistor  driving  six  other 
3-micron  x  4-micron  transistors  has  a  fanout  of  six.  A 
3-micron  x  8-micron  transistor  driving  the  same  load  is 
considered  to  have  a  fanout  of  three.  Though  this  implies 
that  a  high  fanout  problem  can  be  solved  by  merely 
increasing  the  width  of  the  driving  transistor,  it  neglects 
the  effects  of  the  interconnect  wiring.  As  gates  are  added 
to  the  load  of  a  transistor,  each  subsequent  addition  must 
be  more  remote  from  the  driving  transistor.  Since  the 
resistance  of  the  wiring  is  proportional  to  its  length  and 
inversely  proportional  to  its  width,  the  resistance  of  the 
wiring  will  increase  unless  the  width  is  also  increased. 
However,  since  the  capacitance  of  the  wiring  is  proportional 
to  its  area,  most  of  the  gain  achieved  by  widening  the  wire 
to  reduce  resistance  is  offset  by  the  increase  in  capaci- 
tance. As  a  result,  in  the  design  of  the  adder,  increasing 
the  width  of  the  driving  transistor  was  not  viewed  as  a 
complete   fix  for  a   fanout   problem. 

For  the  comparison  of  the  different  approaches  to  CLA 
addition,      the    term    logical    event    needs    to    be  defined.         The 
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most  basic  definition  is  a  combinational  logic  circuit 
accepting  a  set  of  inputs,  performing  its  specified  opera- 
tions on  those  inputs  and  generating  a  set  of  outputs. 
Therefore,  the  input  of  the  addends,  followed  by  the  compu- 
tation and  output  of  the  sum  can  be  considered  as  a  logical 
event.  However,  a  primary  design  consideration  for  the 
adder  is  to  provide  for  testability  and  a  key  element  of 
this  provision  is  the  availability  of  intermediate  results 
(see  section  3  of  this  chapter).  This  implies  breaking  up 
the  sum  generation  into  several  separate  events.  The  first 
event  takes  the  addends  as  inputs,  performs  some  logic  oper- 
ation (s)  on  them  and  stores  the  results  in  a  register.  The 
next  event  takes  its  inputs  from  that  register  and  stores 
its  results  in  another  register.  This  chain  continues  until 
the  last  event  deposits  the  sum  on  the  output  pads  of  the 
chip.  To  provide  the  tester  with  easily  interpreted  inter- 
mediate results,  the  equations  presented  in  this  chapter 
were  taken  as  boundaries  for  each  logical  event.  The  terms 
on  the  right  side  of  the  equation  determine  the  inputs  and 
the  left  side  terms  determine  the  output  of  a  logical  event. 
Once  all  the  inputs  for  an  equation  are  generated  by 
previous  events,  the  logic  of  the  equation  becomes  part  of 
the  current  event. 

1 .   Zero  Level  CIA  Logic 

This  logic  requires  three  events  to  generate  the 
sum.  First,  equations  4.1  and  4.2  are  used  to  generate  the 
P  (i)  fs  and  G  (i)  's.  Second,  from  equation  4.3  the  C  (i)  fs  are 
generated.  Finally,  the  sum  is  derived  from  equation  4.4 
The  principal  problem  with  this  approach  for  a  sixteen-bit 
adder  lies  in  the  application  of  equation  4.3  Here,  the 
input  P  (1)  has  a  fanout  of  15,  which  makes  this  approach 
unsatisfactory. 
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2-      First   Level    CIA    Logic 

Noting  that  a  four-bit  sum  generated  using  zero 
level  CIA  logic  is  within  the  design  guidelines  suggests 
cascading  4-bit  slices  of  the  same  logic  as  indicated  in 
Table   2      Here   the   sum  is      available   after    six  events   and   the 


TABLE    2 
First   Level  CLA   Logic  for   a    16-bit   Sum 


Event 
No. 

Bits 
1-4 

Bits 
5-8 

Bits 
9-12 

Bits 
13-16 

1 

Compute 
P(i)  ,G(i) 

Compute 
P(i),G(i) 

Compute 
P(i),G(i) 

Compu  te 
P(i)  rG(i) 

2 

Compute 
C(i) 

Delay 
P(i)  ,G\i) 

Delay 
P(i),G  (i) 

Delay 
P(i)  ,G\i) 

3 

Compute 
S(i) 

Compute 
C[i) 

Delay 
P(l)  fG\±) 

Delay 
P(i)  rGli) 

4 

Delay 
S(i) 

Compute 
S(i) 

Compute 

c(i) 

Deiav 
?(i)  rGli) 

5 

Delay 
S(i) 

Delay 
S(i) 

Compute 
S(i) 

Compute 
C(i) 

6 

Delay 
S(i) 

Delay 
S(i) 

Delay 
S(i) 

Compute 
S(i) 

fanout  is  reduced  by  a  factor  of  four.  The  event  cycle  time 
reduction  would  more  than  make  up  for  the  event  count 
increase  since  cycle  time  grows  faster  than  linearly  with 
fanout.  The  only  drawback  with  this  design  lies  in  the  cost 
of  extending  it  to  generate  32-bit  or  64-bit  sums.  For 
every  4-bit  slice  added,  another  event  is  required.  Thus,  a 
64-bit    add   would    require    12    events. 

3-      Second   Level    CLA    Logic 

Again  the  data  is  divided  into  4-bit  slices  called 
blocks.  But  rather  than  let  the  carries  ripple  through  the 
blocks,      two   new    primitive      functions   are    introduced.         They 
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are  the  block  propagate,  3P(i)  ,  and  block  generate,  3G(i)  , 
functions.  3P(i)  =  1  implies  that  a  carry  into  block  (i)  will 
be  propagated  through  to  block  (i+1)  .  BG(i)=1  implies  that 
block  (i)  will  generate  a  carry  into  block  (i+1).  For  a  4-bit 
block  where  bit(1)  is  the  least  significant  bit,  The  BP  and 
BG  primitives  are  generated  by  equations  4.5  and  4.6  respec- 
tively,   with   the    P(i)'s   and    G(i)'s   computed   as   before. 


BP[i)  =  P  {i)P  (i)P  (i)P  (i) 


(egn   4.5) 


BG[i)  ~    G  (<)  +  G  WP («)  +  G  WP (*)P (»)"*"  G  H)P i*)P WP (2) 


(egn    4.6) 


Next,  the  block  carry,  3C  (i)  ,  which  represents  the  carry 
from  block (i)  into  block (i+1),  is  computed  using  equation 
4.7    which   represents    the    same   lcgic    as    equation    4.3 


*<?<o-  £ 


*  =  0 


BGi»}'ji+1Bpu 


(egn  4.7) 


So  far,  after  three  events,  the  ?  (i)  's,  G(i)  's, 
BP(i)'s,  BG  (i)  '  s,  and  BC(i)»s  have  been  generated.  If  the 
same  method  of  generating  the  final  sum  as  used  in  zero 
level  CIA  were  to  be  used,  two  additional  events  would  be 
required.  The  first  again  applies  the  logic  of  equation  4.3 
to  each  4-bit  block  to  generate  the  carry  into  each  bit. 
Here  the  Cin  for  block  (i)  is  given  by  BC(i-1).  The  second 
cycle  is  used  to  generate  the  sum  from  the  C  (i)  's  and 
P  (i)  fs.  One  of  these  events  can  be  eliminated  if,  while  the 
BC(i)  's  and  their  predecessors  are  being  computed,  an  esti- 
mated sum  of  the  4-bit  block  is  also  computed.  One  method 
is  to  compute  two  estimated  sums  for  each  block,  one 
assuming  an  carry  into  the  block  of  0  and  the  other  assuming 
a  carry  in  of  1.  When  the  correct  carry  in  for  block  (i)  is 
generated,  it  is  used  to  multiplex  the  correct  sum  for  the 
block  to  the  output.   This  assumed  carry  method  was  rejected 
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because  of  the  large  amount  of  area  consumed  by  the  regis- 
ters needed  to  hold  two  possible  answers.  The  second  method 
is  to  compute  the  estimated  sum  of  the  block  assuming  a 
carry-in  of  0  and  then  correcting  the  estimated  sum  once  the 
actual   carry-in    to   each   block   is  known. 

Since  the  estimated  sum,  ES  (i)  ,  is  not  needed  until 
after  the  third  event  and  computing  it  as  one  event  again 
leads  to  fanout  problems,  the  computation  of  £5(4),  the  most 
significant  bit,  through  ES  ( 1)  is  computed  in  two  events  as 
follows.  First,  an  intermediate  estimated  sum,  IES  (i)  ,  is 
computed  using  two-bit  slices,  each  assuming  a  0  carry  in 
(see  equations  4.8  through  4.11).  At  the  same  time,  a  carry 
from  bit  (2)  into  bit  (3)  (IC23)  is  computed  using  equation 
4.12  On  the  next  event,  ES  (i)  is  computed  from  the  IES(i)'s 
and    IC23    using    equations    4.13    through    4.16    . 


IES{1)  =  /»(,, 
IESp)  =  P{2)QG{i) 


(eqn    4.8) 
(eqn    4.9) 


IES{i)  =  P{i) 

IC2Z  =  G(2)+G(1)/>(2) 
£5(I)  =  IES  (i) 


£5(2)  =  IES  ^) 


ES{S)  =  !C2ZQlES{i) 


ES{i)=   [lES{i)IC2z]QlESi4) 


(eqn  4.  10) 

(eqn  4.  11) 

(eqn  4.  12) 

(eqn  4.  13) 

(eqn  4.  14) 

(eqn  4.  15) 

(eqn  4.  16) 
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Now,  after  three  events,  estimated  sums  for  each 
4-bit  block  and  the  actual  carry  into  each  block  (Cinb)  are 
available.  From  these  the  sum  can  be  computed  using  equa- 
tions   4.17    through    4.20    . 


s[i)  =    C<niQES{l) 

SW=   [c^ES^QESp 


(egn    4.  17) 
(egn    4.  18) 


S{1)=  \c,niES(i)ES{^QES{i)  (egn    4.19) 


S  H)  - 


c,nh  ES {l)ES [2)ES (S)  (~)ES{i 


(egn    4.20) 


Using  second  level  CIA  logic,  the  16-bit  sum  is 
generated  in  only  four  events.  Additionally,  this  design 
can  easily  be  extended  to  the  generation  of  64-bit  sums. 
The  logic  of  equations  4.5  and  4.6  which  produced  the  second 
level  primitives  BP  and  BG  can  be  used  again  to  generate 
third  level  primitives,  B3P  a cd  33G.  These  third  level 
primitives  represent  the  carry  propagate  and  carry  generate 
properties  of  16-bit  slices.  The  carry  into  each  16-bit 
block    is      provided    by      implementing    equation      4.7    .  Thus, 

adding  one  event  will  provide  the  carry  into  each  of  four 
16-bit  blocks  of  a  6  4-bit  sum.  The  logic  of  equation  4.3  is 
then  used  to  generate  the  carry  into  each  4-bit  block  of  the 
sum    and      the   final      sum   is   computed      as   before.  The    final 

result  is  that  by  adding  two  events,  for  a  total  of  six,  and 
using  the  same  logic  as  before  (i.e.  no  new  circuits  need  to 
be  designed),  the  16-bit  adder  can  be  extended  to  a  64-bit 
adder. 
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B.   DESIGN  FOR  TESTABILITY 

Another  primary  objective  cf  the  adder  design  was  to 
provide  for  testability,  that  is,  the  ability  to  logically 
detect  fabrication  errors  or  circuit  malfunctions  rather 
than  visually  searching  for  faults  with  a  microscope. 

As  the  complexity  of  integrated  circuits  has  grown,  the 
ability  to  logically  detect  faults  using  only  the  normally 
available  inputs  and  outputs  has  decreased  markedly.  As 
complexity  increases,  the  number  of  likely  faults  to  be 
tested  for  and  the  number  of  input  vectors  required  to 
isolate  a  specific  fault  grow  rapidly.  Unless  a  design 
technique  is  used  which  allows  the  tester  to  examine  the 
interior  logic  of  a  chip  ,  the  order  of  magnitude  of  the 
number  of  input  vectors  required  to  perform  useful  logical 
testing  is  prohibitive.  Thus,  if  logical  testability  is 
desired,  a  design  technique  that  provides  for  it  must  be 
used. 

One  such  design  technique  is  level  sensitive  scan  design 
(LSSD)  £Ref.  13].  level  sensitive  implies  that  the  output 
of  any  logic  element  is  dependent  only  on  the  levels  of  its 
inputs.  No  logic  elements  are  allowed  to  depend  on  a  tran- 
sition such  as  in  an  edge  triggered  flip  flop.  Scan  design 
implies  that  all  memory  elements  in  the  design  are  to  have 
an  auxiliary  function  where  their  contents  are  serially  fed 
to  an  output  pad  for  examination.  This  gives  a  tester  the 
ability  to  examine  intermediate  results.  In  applying  the 
1SSD  technique  to  the  adder  design,  the  following  steps  were 
taken. 

First,  all  circuits  were  designed  to  respond  to  the 
level  of  their  inputs  and  not  to  require  a  transition  to 
trigger  their  operation.  Second,  to  insure  that  each  logic 
event  worked  only  with  stable,  non-fluctuating  input  levels, 
the  inputs  to  each  event  were   gated.    The  input  gates  were 
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opened  only  after  the  inputs  were  stable  (i.e.  the  outputs 
of  the  previous  event  were  stable)  and  closed  before  the 
input  gates  of  the  previous  event  were  opened.  Third,  a 
dual  mode  latch  was  used  to  stcre  the  output  of  each  logic 
event.  In   the      normal   mode      cf      operation,      the      register 

latches  the  outputs  of  one  lcgic  event  in  parallel  and 
stores  them  to  be  used  as  inputs  for  the  next  logic  event. 
In  its  secondary  mode  of  operation,  the  register  stops 
taking  its  parallel  inputs  and  starts  to  run  as  a  shift 
register,    shifting    its  contents   onto   an   output   pad. 

One  of  the  conseguences  of  using  the  LSSD  technique  is 
the  large  amount  of  area  consumed  by  the  dual  mode  regis- 
ters. In  high  speed  operation,  an  inverter  pair  would  be 
sufficient  to  store  inter-event  results.  But  to  permit  low 
speed  testing  where  the  capacitance  of  a  gate  may  discharge 
during  one  clock  phase,  and  provide  the  dual  mode  feature,  a 
pair    of   clocked    latches    with    control    circuits  is    required. 

C.       LAYOUT    DESIGN 

With  the  logic  decided  upon,  the  next  step  was  to  create 
the  layout  of  the  adder.  The  lcgic  consisted  of  four  events 
to  produce  the  sum.  Another  event  was  needed  to  latch  the 
input  data  onto  the  chip.  A  two-phase  clock  was  needed  to 
insure  that  two  adjacent  events  did  not  run  simultaneously 
(insuring  stable  inputs  to  each  event).  To  make  the  output 
of  the  adder  compatible  with  the  input  to  another  adder,  a 
one  event  delay  was  added.  This  insures  that  the  output  of 
one  adder  does  not  change  while  a  second  adder  is  using  the 
sum    from      the   first      as    an    input.  With    two      16-bit   addend 

inputs,  one  carry-in  input,  one  power  supply  (Ydd)  input, 
one  reference  (GND)  input,  a  16-bit  sum  output,  one  carry- 
out  output,  and  two  clock  inputs,  ten  pads  were  left  from  a 
standard  64-pin  chip  for  register  mode  control  input  and 
register    (shift    mode)    output.       Since   the    design      called      for 
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five  registers,  one  for  each  logic  event  and  one  for 
latching  the  input  data,  five  pads  were  used  for  input  of 
the  register  mode  control  signals  and  five  were  used  for  the 
registers  to  serially  output  their  contents.  With  the 
required  inputs  and  output  identified,  the  preliminary  floor 
plan  shown  in  Figure  4.2  was  created. 
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Figure  4.2   Preliminary  Chip  Floorplan. 
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The  first   circuit  designed  was   the  dual  mode   latch  of 
Figure  4.3   Here  the   circuit  is   designed  to   latch  the   IN 


level  when  Control  is  low  (Control  is  high)  and  phil  is  high 
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Figure  4.3    Dual  Mode  Latch. 


(phi  1  is  low).  When  phil  goes  low,  a  copy  of  the  input  is 
also  stored  in  the  second  latch  and  becomes  available  at 
shift-out  which  is  connected  to  shift-in  of  the  next  latch. 
When  control  goes  high,  the  IN  signal  is  blocked  and  the 
latch  takes  its  input  from  the  register  to  the  left.  The 
shift-in  of  the  leftmost  latch  in  a  register  is  tied  to 
ground.  Versatec  plots  of  the  actual  layouts  of  this  dual 
mode  latch  and  the  other  circuits  described  in  this  section 
are  given  in  Appendix  E. 

The  ,AND  gate  used  was  corstructed  from  a  NAND  gate 
followed  by  an  inverter  as  shown  in  Figure  4.4  Similarly, 
the  OB  gate   was  constructed  frcm  a  NOR  gate   followed  by  an 
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inverter  (see  Figure  4.5).   Although  logic  implemented  using 
these  AND  and  OR  gates  is  more  area  consuming  than  the  same 
logic  implemented  in  NAND  and  NCR  gates  only,  the  penalty  is 
not  severe  because  they  were   used  infrequently  in  the  final 
design. 


Figure  4.4    AND  Gate. 
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Figure   4.5        OB   Gate. 

The    exclusive      OR  gate     (XOE)         was  constructed      from    two 
inverters   and   three    NAND    gates    as    shown    in    Figure    4.6    . 
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Thougii  this  design  is  considera  hly  more  area  consuming  than 
the  XCE  gate  of  Figure  3.1,  it  was  selected  because  the  RNL 
circuit   simulator  could   correctly    model    its   operation. 


Figure  4.6  Exclusive  OR  Gate- 
More  complex  logic  functions  were  implemented  using 
programmed  logic  arrays  (PLA)  where  the  outputs  are  the 
logical  sum  (OR)  of  the  products  (AND)  of  inputs.  A  single 
phase  design  was  needed.  A  FLA  designed  to  compute  when 
phil  is  high,  between  the  time  the  preceding  event  had 
produced  stable  outputs  (phi2  gcing  low)  and  the  time  phil 
goes  low,  had  to  produce  the  proper  sum-of -products  results. 
To  hold  down  fanout,  a  dynamic  structure  was  needed  so  that 
inputs  could  be  applied  to  a  single  type  of  transistor.  To 
prevent  steady  state  power  consumption  a  precharged  dynamic 
structure  was  needed.  Because  of  charge  sharing,  the  prec- 
harging  must  take  place  while  the  inputs  are  present  on  the 
transistor  gates  of  the  PLA  (see  chapter  5,  section  C,  for  a 
complete  explanation  of  the  charge  sharing  problem  in  this 
PLA    structure)  .       Thus,    two   distinct    events    must    occur   during 
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this    time   period.  First,       the  inputs   must      be    applied   and 

precharging   must    take   place.       Then    evaluation   must    occur. 
To   cause   these    two   events    to      occur    during    a    single    phase   of 
the   clock,      the    inter-phase   time   when   both    phil    and    phi2    are 
low    must   be   utilized   for      precharging.         The   basic    structure 
of   the   resulting   PLA    is   shown   in  Figure    4.7 


Figure   4.7        PIA  Structure. 

deferring  back  to  the  flocrplan  in  Figure  4.2,  the 
layout  of  the  circuits  which  perform  the  logic  of  each  event 
are  presented  in  Appendix  E.  The  names  assigned  to  the 
layouts  are  given  below.  Event  1  consists  of  a  33-bit  dual- 
mode  latch.  Event  2,  which  computes  the  P  and  G  primitives 
for  each  bit, is  made  up  of  16  AND  gates,  16  XOE  gates,  and 
another  33-bit  latch.  Event  3,  which  computes  the  BP  and  BG 
primitives,  The  IES  (i)  fs  and  the  IC23  for  each  4-bit  block, 
is    made      up  of    four   instances      cf   PLA82    and   a      29-bit   latch. 
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The  circuit  PLA82  is  made  up  of  an  8-input,  5-product, 
2-output  PLA  ,  two  XOE  gates,  ore  AND  gate,  and  one  OR  gate. 
Event  4,  which  computes  the  ES(i)  fs  and  BC  for  each  4- bit 
block  uses  four  instances  of  PLA84  to  compute  the  ES(i)'s 
and  one  instance  of  PLA915  to  compute  the  BC  (i)  's  and  a 
21-bit  latch.  The  circuit  PLA915  is  a  9-input,  15-product, 
5-output  PLA  and  the  circuit  P1A84  is  an  8-input,  7-product, 
4-output   PLA.  Event    5      uses    four      instances   of      PLA104    to 

compute  the  S (i)  fs  and  a  17  bit  latch  to  store  results  and 
provide  the  added  delay  (by  taking  the  output  from  the  shift 
out  position,  the  extra  clock  cycle  of  delay  is  generated) . 
The  circuit  PLA104  is  a  10-input,  14-product,  4-output  PLA. 
With  this  design,  the  input  to  output  latency  is  three  full 
cycles  of  a  two-phase  non-overlapping  clock;  three  cycles  of 
the  clock  elapse  between  the  time  the  addends  are  presented 
to  the  chip  and  the  time  the  sum  becomes  available  at  the 
output.  In  the  first  three  registers  the  odd  number  of  bits 
is  due  to  the  need  to  store  the  carry-in  value  until  event 
4.  In  the  last  two  registers  the  odd  number  of  bits  is  due 
to   the   need    to    store    the   computed    value   of   carry-out. 

The  resulting  final  layout  of  Figure  4.3  shows  the 
actual  on-chip  layout  locations  of  each  event's  logic.  In 
addition  to  the  logic  circuits  for  each  event,  the  circuits 
AMP   and   AMP5    are   also  seen.  These   are   driver   circuits   for 

the    high   fanout      control    and    clcck   signals.  Each    takes    as 

its  input  a  control  signal  and  produces  as  outputs  the 
control  signal  and  its  inverse,  both  driven  by  3-micron  x 
160-micron  transistors.  This  amplifier  is  the  same  design 
used    by    the   output    pads   to   drive   off    chip    loads. 

This  final  layout  represents  one  implementation  of  a 
pipelined  CLA  adder  designed  for  testability.  The  relative 
merits  of  this  design  and  others  that  may  have  been  imple- 
mented can,  as  yet,  only  be  gualitati vely  discussed.  The 
addition   of    SPICE    2G7   to    the    CAE   toolbag    will   provide    future 
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Figure  4.8    Final  Layout. 

CMOS  designers  with  the  quantitative  analysis  necessary  to 
make  decisions  involving  tradeoffs  among  primary  design 
objectives. 
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This  final  design,  when  simulated  using  RNL,  functioned 
properly  at  clock  speeds  up  to  14  megahertz.  Testing  of  the 
actual  chips  produced  by  MOSIS  should  give  an  indication  of 
the  accuracy  of  RNL's  predictions.  The  following  chapter 
presents  a  test  plan  to  check  for  proper  operation  of  the 
adder  at  low  clock  rates  and  to  determine  the  maximum  oper- 
ating speed. 
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7.    TEST    PLAH 

After  several  iterations  of  the  design-simulate-redesign 
loop,  a  final  layout  was  achieved  for  the  16-bit  pipelined 
adder.  These  iterations  provide  considerable  confidence  in 
the  logical  correctness  of  the  layout.  Appendix  D  contains 
ENL  simulation  results  for  the  full  adder.  In  reading  these 
results  it  should  be  kept  in  aind  that  the  adder  requires 
three  cycles  of  the  two-phase  clock  to  produce  the  sum.  In 
the  first  part  of  the  simulation,  the  inputs  were  kept 
constant  for  three  clock  cycles  to  facilitate  easier  reading 
of  the  results.  With  these  steady  inputs,  simulations  were 
run  to  verify  the  generation  of  correct  sums,  concentrating 
on  those  addends  that  would  produce  carry  propagates  and 
carry  generates  across  the  boundaries  of  the  4-bit  blocks. 
The  last  part  of  the  simulation  utilized  different  inputs 
each  clock  cycle.  This  was  done  to  test  the  pipelining 
feature  of  the  design,  insuring  no  dependence  on  repeated 
inputs   of   the  addends  to   produce   the   proper   sum. 

After  fabrication  of  the  chip,  application  of  similar 
inputs  to  make  the  same  determinations  for  the  actual 
circuits  will  form  the  initial  portion  of  the  test  plan.  In 
this  chapter  a  test  plan  for  the  verification  of  computa- 
tional correctness   and  speed    will   be   presented. 

A.       INPUTS    AND    OUTPUTS 

The  first  step  in  testing  the  chip  will  be  to  connect  it 
to  the  required  input  and  output  circuitry.  To  accomplish 
this,  the  identity  of  the  inputs  and  outputs  on  each  pin 
must  be  determined.  Microscopic  examination  of  the  chip 
will  reveal  the  logo  "16-bit  Add",  located  between  the  GND 
and   Vdd      buses    for      the    pads      in   the      northeast    corner       (see 
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Figure  4.8  which  is  repeated  below  for  convenience).  Using 
this  landmark,  the  signals  on  the  pads  can  be  labeled  as 
follows. 


Figure  4.8  (repeated)  Final  Layout 
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The  western  edge  has  sixteen  input  pads  for  the  addend 
A,  with  the  least  significant  bit,  A(1),  located  at  the 
northern  end.  The  northern  edge  of  the  chip  also  has 
sixteen  input  pads  for  the  addend  B,  with  the  least  signifi- 
cant bit,  B{1),  located  at  the  eastern  end-  The  southern 
edge  has  fourteen  output  pads  and  two  input  pads.  At  its 
western  end  is  the  GND  input  pad  followed  by  fourteen  output 
pads  for  S(16),  the  most  significant  bit  of  the  sum,  through 
S(3).  Following  S  ( 3)  ,  at  the  eastern  end  is  the  input  pad 
for  Vdd.  The  eastern  edge  of  the  chip  has  eight  input  pads 
and  eight  output  pads.  Starting  at  the  northern  end,  there 
are  input  pads  for  phil,  phi2,  Cin,  C0N1  (control  signal  for 
the  dual  mode  register  of  event  1),  C0N2,  C0N3,  C0N4,  and 
C0N5.  They  are  followed  by  output  pads'  for  SREG1  (serial 
output  from  dual  mode  register  of  event  1),  SREG2,  SEEG3, 
SREG4,  SREG5,  Cout,  S  (2)  ,  and  S  (1)  at  the  southern  end. 

To  supply  power  to  the  chip,  +5  volts  DC  should  be 
applied  to  the  Vdd  pad  and  0  volts  to  the  GND  pad.  All 
logical  inputs  including  clocks  and  control  signals  should 
be  either  GND  for  a  logical  0  or  Vdd  for  a  logical  1. 
Simulation  with  RNL  revealed  sonie  restrictions  on  the  clock 
signals.  For  proper  operation,  each  clock  should  remain 
high  for  a  minimum  of  20  nanoseconds  and  the  clock  inter- 
phase time,  when  both  phil  and  phi2  are  low,  must  be  at 
least  10  nanoseconds  in  duration.  For  initial  testing,  to 
insure  that  charge  sharing  protlems  caused  by  too  short  an 
interphase  time,  and  fanout  problems  caused  by  too  short  a 
clock  phase  duration,  are  not  interpreted  as  fabrication 
errors,  the  clock  speed  should  be  adjusted  so  that  both 
above  clock  parameters  are  exceeded  by  one  order  of 
magnitude. 

The  outputs,  like  the  inputs,  are  at  Vdd  to  represent  a 
logical  1  and  at  GND  to  represent  a  logical  0.  The  circuits 
used   to   measure    the   outputs   should   have    high   input 
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impedance,  on  the  order  of  one  megohm.  The  output  pads  of 
the  adder  are  not  designed  to  handle  the  current  source  and 
sinx  requirements  of  transistor-transistor  logic  integrated 
circuits.  The  output  measurement  circuits  should  be 
constructed  using  NHOS  or  CMOS  devicesthat  are  designed  to 
operate  between  +5  7clts  DC  and  ground. 

B.   TESTING  FOE  CORRECT  OPERATION 

After  connecting  the  adder  to  a  test  harness,  the  next 
step  is  to  verify  the  generation  of  correct  sums  by  the 
adder.  There  are  several  inputs  that  should  be  included  in 
the  testing  to  verify  the  correct  operation  of  individual 
circuits.  These  are  contained  i-n  Appendix  F.  In  addition 
to  the  test  vectors  of  Appendix  F,  several  randomly  selected 
input  vectors  should  be  tested.  If  the  adder  should  fail  to 
generate  correct  sums,  The  LSSD  features  can  be  employed  to 
examine  intermediate  results. 

1 .   Intermediate  results 

With  the  LSSD  design,  a  tester  can  leave  input 
levels  constant  for  a  long  period  of  time  and  use  the  shift 
mode  of  the  internal  registers  to  examine  the  internal  state 
of  the  chip.  The  rightmost  bit  of  each  register  is  always 
available  at  the  output  pad  for  that  register.  To  obtain 
the  contents  of  the  other  bits,  the  control  signal  for  the 
given  register  is  set  to  and  held  at  logical  1  while  the 
clock  continues  to  run.  For  registers  1,  3,  and  5  the 
serial  output  will  be  meaningful  and  stable  while  phi2  is 
high.  The  serial  output  of  registers  2  and  4  will  be  stable 
when  phil  is  high.  Table  3  lists  in  order  the  intermediate 
values  available  at  the  5REG  (n)  output  pad  when  the  input 
CONn  is  high. 
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TABLZ  3 
Register  Serial  Outputs 


Clock 

Cycle 

SEEG1 

SHEG2 

SREG3 

SREG4 

SREG 

0 

B1 

P1 

BP1 

Cin 

S.1 

1 

B2 

P2 

IES3 

BC2 

S3 

2 

B3 

P3 

IES4 

Cout 

S5 

3 

B4 

P4 

BG2 

ES2 

S7 

4 

B5 

P5 

IES5 

ES4 

S9 

5 

B6 

P6 

IES6 

ES6 

S11 

6 

B7 

P7 

IC67 

ES8 

S13 

7 

B8 

P8 

BP3 

ES10 

S15 

8 

B9 

P9 

IES11 

ES12 

0 

9 

B10 

P10 

IES12 

ES14 

Cout 

10 

B1  1 

P12 

BG4 

ES16 

S2 

11 

B12 

P12 

IES13 

BC1 

S4 

12 

313 

P13 

IES14 

BC3 

S6 

13 

314 

P14 

IC1415 

ES1 

S8 

14 

315 

P15 

BG1 

ES3 

S10 

15 

B16 

P16 

IES1 

ES5 

S12 

16 

A1 

G1 

IES2 

ES7 

S14 

17 

A2 

G2 

IC23 

ES9 

S16 

18 

A3 

G3 

BP2 

ES11 

0 

19 

A4 

G4 

IES7 

ES13 

0 

20 

A5 

G5 

IES8 

ES15 

0 

21 

A6 

G6 

BG3 

0 

0 

22 

A7 

G7 

IES9 

0 

0 

23 

A8 

G8 

IES10 

0 

0 

24 

A9 

G9 

IC1011 

0 

0 

25 

A10 

G10 

BP4 

0 

0 

26 

A11 

G11 

IES15 

0 

0 

27 

A12 

G12 

IES16 

0 

0 

28 

A13 

G13 

Cin 

0 

0 

29 

A14 

G14 

0 

0 

0 

30 

A15 

G15 

0 

0 

0 

31 

A16 

G16 

0 

0 

0 

32 

Cin 

Cin 

0 

0 

0 

33 

0 

0 

0 

0 

0 

34 

0 

0 

0 

0 

0 

C.   TESTING  FOR  SPEED  OF  OPERATION 

Once  the  chips  containing  fabrication  errors  have  been 
culled  from  the  chip  set  returned  by  MOSIS,  the  task 
remaining  is  to  determine  just  how  fast  the  adder  can  run. 
Rather  than  simply  increasing  the  clock  rate  until  the  adder 
fails,  the  duration  of  the  time  both  phil  and  phi2  are  high, 
and  the  interphase  time  should  reduced  separately.  RNL 
simulation  indicates  that  the  circuit  which  generates  S4 
within  P1A104  is  the  limiting  circuit  for  clock  phase  dura- 
tion  (i.e.    it   requires  the   longest   time   to   correctly 
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evaluate  its  inputs).  RNL  simulation  also  indicates  that 
the  circuits  in  PLA  104  which  generate  S1  and  S4  are  the 
limiting  circuits  for  the  clock  interphase  duration. 

Since  the  PLA  is  constructed  of  precharged  dynamic 
circuits,  the  evaluation  clock  phase  must  be  long  enough  to 
allow  the  inputs  to  drive  the  outputs  to  their  proper 
values,  even  if  the  inputs  are  the  same  as  those  of  the 
previous  evaluation  cycle.  This  allows  the  tester  to  use  a 
constant  input  as  the  duration  of  each  clock  phase  is 
reduced  until  the  adder  produces  incorrect  results. 

Determination  of  the  clock  interphase  duration  limit  is 
more  difficult.  This  is  because  the  inputs  to  a  PLA  must  be 
changing  to  cause  charge   sharing  problems   to  occur.    For 


Figure  5.  1   Charge  Sharing  in  a  PLA. 

example,   in  Figure  5.1  assume  that   the  first  set  of  inputs 
is  in1=1,   in2=0,    and  that  this  is   correctly  evaluated  to 
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produce   out=0    when    phil    is   high.  Now    assume   that    the    next 

input  is  in1=0  and  in2=1,  which  should  also  evaluate  to 
out=0.  However,  if  the  precharge  time  (when  the  inputs  are 
present  on  the  gates  of  Q2  and  £3  and  phil  is  still  low)  is 
insufficient,  C2  will  not  be  charged  to  Vdd  when  precharging 
ends  (C2  was  discharged  to  zero  volts  during  the  previous 
evaluation  when  in1  was  high  and  phil  was  high).  Now,  when 
evaluation  begins  (phil  going  high)  the  low  voltage  across 
C2  causes  Q5  and  Q6  to  interpret  their  input  as  a  logical  0. 
As  a  result  the  output  of  the  Q5-Q6  inverter  pair  goes  high, 
causing  Q8  to  turn  on,  discharging  C4  and  resulting  in  an 
output  of  logical  1,  which  is  incorrect.  Table  4  lists  the 
proper  evaluation  seguence  when  precharge  time  is  sufficient 
and  the  improper  seguence  due  to  insufficient  precharge 
time.  In  this  table,  for  the  inputs,  output,  and  capacitor 
voltages  a  1  indicates  Vdd,  0  indicates  GND,  and  X  indicates 
somewhere  in  between.  For  the  transistors,  a  1  indicates 
on,      a    0   indicates    off,      and   an   X   indicates   neither    fully   on 

TABLE    4 
PLA  Evaluation   Sequences 


Proper  evaluation   seguence: 

'234 


phi   in   C     Q         1   out 
1  2   12   1234   T; 


1  0  10  0011  1  1000 10  C0 1  0 

0  0  10  00  11  010101 1C01  0 

0  1  01  0011  010101  1001  0 

0  0  01  0111  001101  1C01  0 

1  0  01  0111  1010010C01  0 


Improper  evaluation    seguence: 

phi      in      C  Q  1 

1    2       12       1234      1234567890 


1 

0 

10 

0011 

1100010C01 

0 

0 

0 

10 

0011 

010101  1C01 

0 

0 

1 

01 

0011 

010101  1C01 

0 

0 

0 

01 

0X11 

0011011C01 

0 

1 

0 

01 

oxxo 

1010XX0X10 

1 
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nor  fully  off.  Subsequent  inputs  of  in  1  =  0  and  in2=1  may 
produce  correct  results  since  with  constant  inputs,  each 
precharge  time  will  add  more  charge  to  C2  until  there  is 
sufficient  charge  to  allow  the  output  of  the  Q5-Q6  inverter 
to   remain   low. 

Thus,  to  check  for  charge  sharing  problems  in  the 
circuit  of  Figure  5.1,  the  inputs  must  alternate.  Likewise, 
in  PLA104  to  check  for  charge  sharing  errors  in  output  S1, 
its  inputs  must  alternate  between  ES1=0,  BC=0  and  ES1=1, 
BC=1  as  the  interphase  time  is  reduced.  This  can  be  accom- 
plished for  all  four  instances  of  PLA104  simultaneously  by 
alternating   inputs   of 

A    =    0001    1001    1001    1001 

B    =    0000    1000    1000    1000 

Cin   =   1 

and 

A  =  0000  0000  0000  0000 
B  =  0000  0000  0000  0000 
Cin    =   0 

To  check  for  charge  sharing  errors  in  S4,  the  inputs  to  PLA 
104  must  cycle  between  BC=1,  S4=0,  S3=S2=1,S1=0  and 
BC=0, S4=0,S3=S2=S1=1 .  This  may  be  accomplished  for  all  four 
instances  of  PLA104    simultaneously   by   alternating   inputs  of 

A    =    0110    1 1 10    1110    1110 

B   =    0000    1000    1000    1000 

Cin    =    1 

and 

A  =  011  1  0111  0111  0111 
B  =  0000  0000  0000  0000 
Cin    =   0 

This  maximum  speed  testing  assumes  that  RNL  has  correctly 
identified      the         slowest      circuits        on      the        chip.  RNL 
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simulations   have  indicated   that  the   next  slowest   circuit 
(PLA915)   is  at  least  20%  faster   than  PLA104  (16.0  nsec  for 
PLA915  vs.   20.1  nsec  for   PLA1C4).    Also,   ail  other  PLA's 
functioned  properly  with  a  5  nsec  interphase  time. 

Should  PLA104  prove  to  be  the  speed  limiting  circuit  for 
the  chip,  the  actual  failure  speeds  of  the  chip  can  serve  as 
an  indication  of  the  accuracy  of  the  RNL  simulation  for 
future  designs. 
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VI.    CONCLUSIONS 

The  experience  gained  in  the  design  of  the  adder  coupled 
with  the  clarity  of  hindsight  leads  to  the  following  conclu- 
sions   and  recommendations. 

A.  THE    CMOS    TECHNOLOGIES 

The  CMOS  technologies  will  play  a  role  of  steadily 
increasing  importance  in  the  "VLSI  designs  of  the  future. 
MOSIS  is  already  offering,  on  an  experimental  basis,  CMOS 
Bulk  p-well  fabrication  with  a  one-micron  minimum  feature 
size.  A  scalable  set  of  design  rules,  to  allow  initial 
fabrication  in  3-micron  CMOS  fcr  design  verification  before 
the  far  more  expensive  1-microc  process  is  used,  is  being 
developed. 

In  the  private  sector  there  is  considerable  research 
aimed  at  finding  an  insulating  substrate  material  that  does 
not  have  the  variability  and  thermal  problems  of  sapphire. 
Progress  in  this  area  will  remove  the  drawback  caused  by 
latchup    tendencies    in  CMOS    Bulk. 

B.  CMOS   CAD    TOOLS 

Though  the  design  tccls  currently  available  at  NPS  consti- 
tute a  complete  set  for  the  design  of  CMOS  Bulk  p-well 
circuits,  the      recent      CAD      tool        set      released      by      the 

University  of  Washington/Northwest  VLSI  Consortium,  Release 
2.0  [Ref.  11 ],  coupled  with  University  of  California  at 
Berkeley  Winter  1983  CAD  tools,  represents  a  more  complete 
and    cohesive      set   for     CMOS    design.  When    sufficient      disk 

space  on  the  Vax  11-780  beccmes  available  to  load  the 
Release    2.0,      implementation    of    the      Release    2.0    CAD    package 
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is  highly  recommended.  An  added  benefit  of  installing  the 
Release  2.0  package  is  the  cell  library  provided.  The 
library  contains  several  basic  standard  cells  with  known 
performance  characteristics.  The  library  also  contains  the 
standard  pad  frames  used  by  MOSIS.  Though  MOSIS  does  not 
require  the  use  of  standard  pad  frames  on  designs  submitted, 
their  use  does  speed  up  fabrication. 

As  mentioned  earlier,  as  socn  as  SPICE  2G7  is  available, 
its  addition  to  the  CAD  toolbag  would  be  most  advantageous 
to  a  CMOS  designer. 

C.   DESIGH  OF  THE  ADDER 

If  the  design  of  the  adder  were  to  be  undertaken  again, 
a  different  approach  to  generating  the  sum  would  probably 
have  been  used,  especially  if  the  new  CAD  tools  mentioned 
above  were  available.  The  logic  approach  to  the  computation 
would  still  involve  CLA  addition,  but  it  would  be  accom- 
plished using  combinational  logic  and  library  cells  rather 
than  PLA*s.  Testability  would  probably  suffer  greatly,  but 
effort  would  be  made  to  reduce  the  sum  generation  tc  two 
logical  events.  Though  the  level  of  testability  provided  by 
the  current  design  should  provide  considerable  insight  into 
CMOS  Bulk  p-well  performance  and  CAD  tool  accuracy,  there 
would  be  no  need  to  repeat  the  investigation. 
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APPENDIX    A 
SPICE    MODEL    CABDS    FOE    3-MICRON    CMOS-PW   DEVICES 

CMO*    models   for    MOSIS    3-micron   CMOS   Bulk   p-well   devices: 

Fast   Models 
.model      n      nmos      vto=0.4      tox=0-7e-7      lambda=1e-7      ld=1e-6 
+xj=1.1e-6      gamma=.3      uo=500      cbd=5e-4      cbs=5e-4 

.model      p      pmos      vto=-.4      tox=0. 7e-7      Iambda=1e-7      ld=1e-6 
+  xj=1.1e-6      gamma=.3      uo=300      cbd=3.5e-4    cbs=3.5e-4 

Slow   Models 

.model      n  nmos      vto=1.0      tox=Q.8e-7      lambda=1e-7      ld=.5e-6 

+  xj  =  0.6e-6  gamma=1.3    uo=400      cbd=6e-4      cbs=6e-4 

.model      p  pmos      vto=-1.0    tox=0.8e-7      lambda=1e-7      ld=.5e-6 

♦xj=0.6e-6  gamma=-9      uo=200      cbd=4.1e-4      cbs=4.1e-4 


MIT    Models   for    MOSIS    3-micron    CMOS    Bulk    p-well    devices: 

Slow   -    Slow 
.model   nss   nmos      level=2      rsh=20      tox=650e-10      ld=.25e-6 
+xj=.35e-6      cj=6e-4      cjsw=4e-1C      wo=475      vto=1.2 
+cgso= 1.3e-10       cgdo=1.3e-10      nsub=1.5e16 
+  vmax=5e4        pb=.7         mj=.5        mjsw=.  5 
+neff=2.5        ucrit=8e4        uexp=.25 

.model    pss   pmos      level=2      rsh=80      tox=650e-10      ld=.25e-6 
+xj=.35e-6      cj=4.1e-4   cjsw=2.5e-10      uo=190      vto=-1.2 
+cgso= 1.3e-10      cgdo=1.3e-10      nsub=5e15         tpg=-1 
+vmax=5e4        pb=.7         mj=.5        mjsw=.5 
+neff=2-5        ucrit=8e4        aexp=.  15 
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Fast   p-type      Slov   n-type 

.model   nfs   nmos      level=2      rsh=30      tox=600e-10      ld=.25e-6 
+xj=.35e-6      cj=6.0e-4   cjsw=4. Oe-10      uo=475      vto=1.2 
+cgso=1.9e-10      cgdo=1.9e-10      nsub=1.5e16 
♦vmax=5e4        pb=.7        mj  =  .5        mjsw=.5 
+neff=2.5        ucrit=8e4        uexp=. 25 

.model   pfs   pmos      level=2      rsh=20      tox=600e-10      ld=.40e-6 
♦xj=.60e-6      cj=2.0e-4   cjsw=1-0€-10       uo=270      vto=-0.6 
+  cgso=2.0e-10      cgdo=2.0e-10      nsub=0.3e15         tpg=- 1 
+vmax=5e4        pb=.7        m  j=.  5        mjsw=.  5 
+neff=2.0        ucrit=8e4        uexp=.  15 

Past  p-type        Fast  n-type 

.model   Lff   nmos      level=2      rsh=10      tox=550e-10      ld=.40e-6 
+xj=.60e-6      cj=3.0e-4   cjsw=2. Oe-10      uo=675      vto=0-6 
+cgso=2.5e-10      cgdo=2.5e-10      nsub=0.5e16 
♦vmax=5e4        pb=.7        mj=.5        mjsw=.  5 
+nef f=2.5        ucrit=8e4        uexp=. 25 

.model   pff   pmos      level=2      rsh=20      tox=550e-10      ld=.40e-6 
+xj=.60e-6      cj=2.0e-4   cjsv=1.0€-10      uo=270      vto=-0.6 
+cgso=2.5e-10      cgdo=2.5e-10      nsub=0.3e15        tpg=-1 
♦vmax=5e4        pb=.7         mj=.5         mjsw=.5 
+neff=2.0   '     ucrit=8e4        uexp=.  15 
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Slow    p-type        Fast   n-type 

.model    nsf   naos      level=2      rsh=  10      tox=600e-10      ld=.40e-6 
♦xj=.60e-6      cj=3.0a-4   cjsw=2.0€-10      uo=675      vto=0.6 
+cgso=2.0e-10      cgdo=2.0e-10      D=ub=0.5e16 
+vmax=5e4        ph=-7         aij=.5         mjsy=.5 
+neff=2.5        ucrit=8e4        uexp=.25 

.model   psf    pmos      level=2      rsh=80      tox=600e-10      ld=..25-6 
+xj=-35e-6      cj=4. 1e-4   cjsw=2.5e-10      uo=190      vto=-1.2 
+cgso=1.2e-10      cgdo=1.2e-10      nsub=5.0e15         tpg=-1 
♦vmax=5e4        pb=.7         mj=.5        rajsw=. 5 
♦neff=2.0        ucrit=8e4        uexp=.  15 
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APPENDIX    B 
DNII    MAHUA1    ENTET    FOB    EOLEC 

RULEC (CAD)  CAD  Toolbox  User's  Manual  RULEC (CAD) 


NAME 

rulec  —  Compile  design  rules  for  Lyra 

SYNOPSIS 

rulec  [— lo]  rules 

DESCRIPTION 

Rulec  is  a  shell  script  with  the  following  processing  steps: 

i)        .  The  actual  Lyra  rule  compiler  is  invoked  to  translate  the  symbolic  rule 
description,  rules. r,  to  lisp  code,  rules. L 

ii)  The  lisp  compiler,  Liszt,  is  invoked  to  compile  rules.l  to  -rules. o 

iii)        rules. o  is  loaded  into  Lyra.proto  to  generate   an   executable  lisp  Lyra, 
rules. 

iv)         The  intermediate  files  rulesX  and  rules. a  are  deleted. 

The  following  options  are  supported: 

— 1  (load  011I7)  No  compilation  is  done.    Previously  compiled  rules,  rules. o, 

are  loaded  into  Lyra.proto  to  generate  an  executable  Lyra  rules.    This 
option  is  useful  mainly  at  Berkeley,  where  Lyra.proto  changes  frequently. 

— o         (save   object)   Name.o  is  not  removed.     Enables    "rulec  4  rules'   in  the 
future. 

FILES 

~cad/bin/ rulec  —  rulec  shell  script. 

~cad/lib/lyra/Rulec  1  —  lisp  rule  compiler 

~cad/lib/lyra/Lyra.proto  —  Lyra  sans  compiled  rules  code. 

^cad/lib/lyra/^r  —  standard  rulesets. 

""cad/lib/lyra/DEFAULTS  --  gives  default  rulesets  for  Caesar  technologies. 

SEE  ALSO 

Lyra  (CAD) 
Liszt  (1) 

AOTHOS 

Michael  Arnold. 
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APPENDIX    C 
PBESIM    USEE'S    GUIDE 


Config   file:    used    to    calibrate    ENL 

capm2a  .00000 

capm2p  .00000 

capma  .00006 

capmp  .00000 

cappa  .00006 

cappp  .00000 

capda  .00010 

capdp  .00060 

cappda  .00010 

cappdp  .00060 

capga  .00057 

lambda  1.0 
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PRESIM  User's  Guide 


UWINW  VLSI  Consortium 

Department  of  Computer  Science 

University  of  Washington 

Seattle.  WA    98195 


(This  document  is  based  on  portions  of  the  document  'User's  Guide  to  NET,  PRESIM  and 
RNL/NL,*  by  Christopher  J.  Terman,  Laboratory  (or  Computer  Science,  Mi.T.,  Cambridge,  MA 
02139.) 

One  must  first  convert  the  sim  file  to  a  network  file  suitable  for  use  by  RNL  or  NL  -  to  do  this 
we  run  PRESIM: 

presim  foojim  foo  [config]  options... 

which  converts  the  file  foo.sim  into  a  binary  file  for  RNL/NL  called  foo. 

The  -f  option: 

Suppresses  the  sum-of-products  formation.  This  may  be  desired  if  you  think 
sum-of-products  is  formed  wrong  otherwise  the  advantages  of  the  transistor  and 
node  reduction  make  this  option  unattractive. 

The  -«  option: 

•cfile^n  in  value 

writes  a  list  of  node  aames  and  capacitances  to  the  specified  file.  Only  capacitances  larger  than  min- 
value  will  be  included. 

The  -t  option: 

•tfllejnin  value 

writes  a  list  of  transistors  and  RC  values  to  the  specified  file  -  there  are  two  entries  for  each  transis- 
tor. The  R's  come  from  the  size  of  the  transistor,  Ct  from  the  source/drain  capacitance.  Only  RC 
values  larger  than  minvalue  will  be  included. 

The  -p  option: 

-presist  .voltage 

provides  a  worse-case  estimate  of  the  circuit  power  consumption  by  assuming  that  all  the  pullups 
(DEP  or  LOWP  devices  with  drain-VDD)  are  all  on  simultaneously.    "Voltage*  specifics  the  supply 
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UW/NW  VLSI  Consortium 


PRESIM  User's  Guide 


voltage,  (or  example  *-pi*  specifies  a  VDD  or  5  volts.  The  result  is  printed  liter  PRESEM  completes  its 
other  processing.  When  figuring  the  resistance  of  a  pullup  device  the  'power*  characteristic  resistance 
as  set  in  the  coring  file  is  used. 

The  optional  third  file  (con  fig)  specifies  various  electrical  parameters.  The  internal  values  (the 
defaults)  are  a  generic  set.  They  do  not  reflect  any  particular  fabrication  process.  (ITW-NW  VLSI 
NOTE:  A  configuration  file  is  provided  in  the  source  code  that  duplicates  the  internal  settings  as  an 
example  of  how  this  ale  could  be  used.  In  addition  we  note  that,  the  resistor  values  are  stored  first 
sorted  by  width,  then  by  length  not  by  the  ratio.  Values  not  explicitly  provided  in  the  configuration 
file  are  estimated  by  Linear  interpolation.)   The  formal  of  this  file  is  lines  of  the  form 

parameter  value  comments-. 

Lines  beginning  with  '?  are  treated  as  all  comment.  The  parameter  names  and  their  default  values 
are: 

;  configuration  51e  for  "standard"  MFC  process 


capm2a 

.00000 

eaptnlp 

JXWOO 

capma 

.00003 

captnp 

.00000 

cappa 

.00004 

cappp 

DOOOO 

capda 

.00010 

capdp 

■00060 

cappda 

.00010 

cappdp 

D0060 

capga 

.00040 

lambda  2.5 

2nd  metal  capacitance  -  area,  pf/sq-microu 
2nd  metal  capacitance  -  perimeter,  p£/micron 
1st  metal  capacitance  -  area,  pf/sq-micron 
1st  metal  capacitance  -  perimeter,  pf/micron 
poly  capacitance  -  area,  pf/sq-micron 
poly  capacitance  -  perimeter,  pf/micron 
n-diffusion  capacitance  —  area,  pf/sq-micron 
n-diffusion  capacitance  -  perimeter,  pf/micron 
p-diffusion  capacitance  -  area,  pf/sq-micron 
p-diffusion  capacitance  -  perimeter,  pf/micron 
gate  capacitance  -  area,  pf/sq-micron 

microns/lambda  (conversion  from  .sim  file  units 
to  units  used  in  cap  parameters) 


lowthresh    OJ    ;  logic  low  threshold  as  a  normalized  voltage 
highthresh  0.8    ;  logic  high  threshold  as  a  normalized  voltage 

cntpuilup  0         ;  <  >  0  means  that  the  capacitor  formed  by  gate  of 
;  pullup  should  be  included  in  capacitance  of  output 
;  node 

diffperim  0         ;  <  >0  means  do  not  include  diffusion  perimeters 
;  that  border  on  transistor  gates  when  figuring 
;  sidewall  capacitance  (*) 

subparea  0  ;  <  >0  means  that  poly  over  transistor  region  will  not 

;  be  counted  as  part  of  the  poly-bulk  capacitor  {') 
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PRESIM  Uier'i  Guide 


diffext  0 


diffusion  extension  for  etch  transistor,  ije.,  each 
transistor  is  assumed  to  have  a  rectangular  source 
and  drain  diffusion  extending  diffext  units  wide  and 
transistor-width  units  nigh.  The  effect  of  the 
diffusion  extension  is  to  add  some  capacitance  to 
the  source  and  drain  node  of  each  transistor  — 
useful  when  processing  the  output  of  NET  to  improve 
the  capacitive  loading  approximations  without  adding 
explicit  load  capacitors,   diffext  is  specified  in 
lambda  (it  will  be  converted  using  the  lambda  factor 
above). 


resistance  channel  context  width  length  resist 
this  command  specifies  the  equivalent  resistance  for  a  transistor 
of  type  channel  with  the  specified  width  and  length.  Transistors 
matching  this  entry  will  have  the  specified  resistance;  Linear 
interpolation  is  done  if  the  width  and/or  length  is  not  matched 
exactly. 

channel  is  one  of  "enh",  'dep',  "intrinsic*,  low-power", 
"puUup*.  or  "p-chan" 

context  is  one  of  "static",  "dynamic-high",  "dynamic-low",  or  'power* 

width  is  given  in  lambda 

length  is  given  in  lambda 

resist  is  given  in  ohms 


(")  These  paramters  should  be  1  only  when  processing  the  output  of 
the  node  extractor.   They  cause  various  corrections  to  be  made 
to  the  interconnect  component  of  a  node's  capacitance  -  usually 
only  extracted  sim  files  have  information  regarding  interconnect 
capacitance. 

PRESIM  uses  these  parameters  in  calculating  the  capacitance  for  each  electrical  node  and  the  resis- 
tance for  each  transistor  channel. 
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APPENDIX  D 
ADDER  SIMULATION 

The  following  two  listings  are;  (1)  the  RNL  command  file 
for  the  entire  chip  and  (2)  the  results  of  running  that 
command  file.  In  addition  to  this  overall  testing,  all  the 
layout  of  Appendix  G  were  simulated  individually.  A  nice 
feature  of  RNL  is  the  indication  of  when  a  watched  node 
changes  state.  Thus,  by  making  all  the  outputs  of  a  circuit 
watched  nodes,  RNL  will  provide  the  minimum  time  duration 
for  a  clock  cycle  to  produce  the  outputs  (the  longest  time 
indicated  by  the  simulation).  This  can  be  confirmed  by 
running  the  simulation  with  a  faster  clock,  resulting  in 
outputs  of  X  (neither  1  nor  0)  where  insufficient  time  has 
been  allowed. 

RNL  simulation  to  determine  the  minimum  time  for  prec- 
harging  the  PLA  circuits  is  only  slightly  more  involved. 
For  each  product  term  in  the  PLA,  alternating  inputs  are 
selected  that  will  result  in  maximum  amount  of  N+  diffusion 
needing  to  be  charged  from  0  vclts  to  Vdd.  Then  as  these 
inputs  are  alternated,  the  PIA  precharge  time  is  reduce 
until  the  circuit  fails  to  produce  correct  results.  For  the 
PLA ' s  in  the  adder,  visual  inspection  for  the  product  term 
with  the  longest  precharge  requirement  was  done  by  looking 
for  the  longest  N+  diffusion  line  which  must  be  charged 
through  the  maximum  number  of  transistors.  The  visual 
inspection  results  were  confirmed  by  ENL  simulations. 
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Ste 

phi 
phi 
cin 
cor 
con 
con 
con 
con 
hl6 
bib 

C14 

bl3 

bl2 
hll 
blO 
n9  = 
oP  = 
b7  = 
06  = 
b5  = 
t4s 
b3  = 
h2  = 
bis 
al6 
al5 
a!4 
al3 
al2 
all 
alO 
a9  = 
a<3  = 
a7  = 
a6  = 
e5  = 
a4  = 
a3  = 
a2  = 
al  = 


beoins 
=  0  a  0 
=  0  a  0 
0  a  o 


e  0  ns . 


=  0  e 
=  0  a 


=  0 
bC 

=  0 
:0 
:0 
:0 
:C 
:0 
rO 
:C 

a 

? 

a 
a 

a 
a 
e 
P 
a 

:0 

:0 

:0 

:0 

!C 

:0 

:0 
a 
P 
a 

B 

P 
a 
a 
a 

a 


a  0 
8  0 

0 

0 

0 

0 

0 

p 
c 

0 
0 


Ster  beoins  a  in  ns. 
phllrl  a  o 

Step  nealns  a  35  ns, 
phllso  a  0 

Ster-  beoins  a  A*    ns. 
pnl2=l  a  0 
sl6=0  a  14.2 
s°=n  n  16.4 
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sll 
sl3 
sl5 
s7s 
S5  = 

53  = 
S14 
S12 
SlO 
s«  = 
s6  = 

54  = 
S?  = 
Sis 

ste 

Cur 
clo 

aaa 
bbb 
sum 


0  e  15.4 
0  »  16,4 
0  a  ifi.4 

6  16.4 

a  16.4 

a  16.4 
0  a  16.5 
0  a  16.5 
0  P  16.5 

a  16.5 

a  16.5 

P  16.5 

6  16.7 

a  20 
e  is  now: 
ent  times  70 
*ss0b01  cln  =  0  coutsx 
sObOoooooooooncnooo 

sObOOCOOOOOOOOOOOOO 
X  0  0  0  0  0  0  0  0  o  0  0  0  0  0  0  0 


Step  beoins  a  70  ns. 
phi2s0  9    0 

Step  becins  a  so  ns. 
ohilsi  s-  o 

Step  beoins  e  105  ns. 
rhllsO  a  o 

Ster  beoins  a  i j  5  ns , 

phl2=1  8  0 

cout=0  a  72.9 

state  is  now; 

Current  rimes  140 

cloc<s=0b01  cln=0  coutsO 

aaaasObOOCOOOOOOOOOOOOO 

bthh=0t0O0O00000000O0O0 

sumsObO^OOOOOOOOOOOOOOO 

Sten  beoins  a  1 4n  ns , 
chi2=0  p  0 

Ster  becins  a  150  ns, 
ohilsi  a  0 

Step  becins  ?  175  ns, 
ohilso  a  0 

Ster  begins  a  185  ns, 

ohl2si  a  0 

state  is  now: 

Current  times  210 

clockssOtoi  clnsc  coutso 

aaaasOtOOOOOOOOOOOOOf'00 

bhbbsObOOOOOOOOOOOOOOOO 
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SUm=0b0COOOO00O00O0OO00 

Step  beqlns  «•  210  ns. 
phi2=0  a  0 

Step  becins  a  220  ns. 
phil=i  a  n 

Ster  becins  9    2*5  ns. 
Dhll=0  a  n 

Step  becins  a  25?  ns. 
ohi2=l  P  0 
state  Is  now: 
Current  tirres  280 
clocKs=0b01  cln=o  cout=n 
aaaa=0b0  0oo0O0O0O0O00OO 
bbbbsObOOOOOCOOOOOOOooo 
Sum=Ot0OOO00OO0COC0000C 

Ster  beains  *    2«0  ns, 
onl2=0  B  0 

Stec  becins  a  290  ns, 
phllr]  t>  0 

Step  bedns  a  315  ns. 
chil=0  a  n 

Step  becins  a  3?5  ns, 
ohi?=l  a  o 
state  Is  no«»: 
Current  tin>p*  350 
clocks=0b01  cln=C  cout=0 
aaea=ObOOOOOCOOOOnooOOO 
bbbb=0b00000000C0000C00 
gumsObO 00 0 000 00 00000000 


Step  beains  ! 

i  350  ns 

bl6=l  a  0 

bl5=J  a  c 

bl4=l  a  0 

bl3=l  e  0 

bR=l  a  0 

b7  =  l  a  0 

b6=l  a  o 

b5=l  a  0 

al2=l  a  o 

all  =  l  a  o 

■10=1  a  o 

a9=l  a  o 

a4=l  a  o 

a3=l  a  o 

a2=l  a  o 

al=l  a  o 

rhl2=0  a  o 
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Step  beolns  6  350  ns. 
onil=l  a  0 

Sten  beolns  P    395  ns. 
pnii  =  0  a  o 

Step  bealns  a  3^5  ns. 
phl?=l  »  0 
state  Is  now: 
Current  timer  420 
clocl<s  =  0b01  cln  =  0  cout=0 
aaas=nbOonoiiiiooooilil 
bbbhxObl 11100001 1 1 10000 
SUirrObOOOOOOCPOOOOOOOOO 

Step  beolns  a  ^20  ns. 
oni?=o  a  o 

Ster  bealns  a  430  ns. 
phll=l  a  n 

Step  bealns  e  455  ns. 
DhH=0  a  o 

5tec  beclns  a  465  ns. 
obl2=l  a  o 
state  Is  now: 
Current  tlm*=  490 
cloc*s=0fc0l  cln=o  cout=0 
aafla  =  Cb0000llHOOO0iiii 
bbbbactlinooooillioooo 
SUn-  =  Ot00OO00C00OOOOOOO0 

Ster  beolns  a  49C  ns. 
pni2=0  a  o 

Ster  beolns  e  5^0  ns, 
phll=i  a  0 

Step  bealns  £  525  ns. 
phll=P  e  o 

Step  beolns  P  535  ns. 

phl2=l  e  n 

slbsl  e  14,6 

s9=l  a  16.7 

sll=l  a  I6.7 

sl3=l  b  16.7 

Sl5=l  a  16.7 

S7=l  B  16.7 

s5=l  fi  16.7 

s3=l  e  16.7 

sl4=l  P  16.8 

812=1  e  16. P 

SlO=l  e  16. F 
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S8  =  l 

a 

16 

." 

s6=l 

e 

lfc 

.8 

S4=l 

p 

16 

.<? 

s2=i 

a 

17 

Sl  =  l 

3 

19, 

.1 

state 

Is  now 

Current  ti""e  =  560 
clocKs  =  0b01  c  1  n  =  o  c  o  i;  t  =  0 
aaaa=0t00001 1  HOOOOi  i  n 
bbDhrCbl 11 100001 11 innoo 
sumsOfcOHlll  11  111 ?  1  l  ill 


Step 
b9=l 
bl  =  l 
bl6  = 
tl5  = 
bl4  = 
bl3  = 
e«  =  0 
b7r0 
b6  =  n 
C5  =  C 

rhi? 


tealns  ?  560  ns 

e  0 

e  0 
C  a  o 
C  a  o 
0  a  o 
0  a  o 

9  C 

a  0 

a  0 


S  t  e  r  beclns  a  570  ns. 
Dhil=l  p  0 

SteD  beolns  ?  59?  ns. 
nhil=0  a  0 

Stec  beolns  a  605  ns, 
nhl?=l  e    0 
state  Is  now: 
Current  tiire*  630 
clocks=Cb01  c  1  p  =  n  c  0  u  t  =  0 
aaaa=ObOOOC11110000llll 
bbbbsObOOOOOOOlOO'iOOOOi 
SumsObOl 1 11 1 1 1 1 1 1  1  1  11  11 

Step  beains  a  630  ns. 
nni2=0  a  0 

Step  beolns  a  6^0  ns. 
phll=l  a  0 

Stec  beclns  a  665  ns. 
ohil=0  a  0 

Step  beclns  a  675  ns. 
phi2=i  a  0 
state  Is  now: 
Current  tin>e  =  700 
clocks=0b0l  cin=o  coutsO 
a«aa  =  OtCO0Oi  111  00001  in 
pbbb=ObOooooon 100000001 
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sum  =  ot«oi  111111111111111 

Step  beolns  a  700  ns. 
Dhi2=0  a  n 

Stec  bealns  s  710  ns. 
Dhllsi  e  c 

Stec  beoins  ?  735  ns. 
Dhil=r>  a  n 

Step  beolns  e  745  ns. 

nnl2=l  e  n 

sl6=0  e  14.2 

s9srt  e  16.4 

sll=0  e  16.4 

Sl5=0  a    16.4 

S"J  =  0  ?  16.4 

s3=0  P  16,4 

Sl«=0  »    16.5 

Sl2=0  g  16.5 

Sl0=0  <?  16.5 

Sfl=n  a  16.5 

S6  =  0  (?  16,5 

S4=0  s  16.5 

S2=0  C  lb.7 

Sl=0  «  20 

state  Is  now: 

Current  times  770 

clocXssObOl  c  1  n  =  0  cout=o 

aaaa  =  0r0OOOHlJ00COllu 

bbtbsObOCOOOOOl 00000001 

SUfsObOOOOlOOOOOOOlOCOO 

Step  becins  6  770  ns. 
cln=l  a  o 
Dhl?=0  a  0 

Step  beolns  a  7 q r.  ns. 
phllsi  e  o 

Stec  bealns  9    805  ns, 
phllso  e  o 

Step  bealns  I  815  ns. 
Dhl2=l  a  o 
state  Is  now: 
Current  times  84u 
cloctcssObOl  cln  =  ]  cout  =  0 
aaaa  =  Ofc0nooilll0O(i0llll 
thbhsObOOOCOCOlOOOOiOOl 
SUmsObOOOOlOOOOO 0  010000 

Step  beolns  a  840  ns, 
chi2=0  a  o 
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Step  beoins  P  P50  r.s. 
phll=l  »  0 

Step  beoins  P  875  ns. 
phll=0  e  0 

Ster  beoins  £  895  ns. 
phl2=l  a  o 
state  Is  no*: 
Current  tiroes  910 
clock-s  =  ObUl  cln  =  l  cout  =  0 
aaea  =  0b000ruillooocun 
bbbb=0b00O0CG01O0O00O01 
sum=Ob0000100P000030000 

Step  begins  a  9 1 0  ns. 
phi2  =  0  s>  0 

Step  beoins  «  920  ns. 
phi  1  =  1  9    i 

Ster  beoins  a  9  <j  5  ns. 
pnll=0  P  0 

Step  beoins  a  955  r>s. 

phi2=l  e  o 

sl=l  a  19.3 

state  Is  now : 

Current  tlme=  980 

clocks=0b01  cln=l  cout=0 

aaaa=0b0000111100ncilll 

bbbb=0b00nn00010O000001 

SUirsObOOOClOOOOO^OlOOOl 

Stec  beclns  a  9R0  ns. 

a  1 6  =  1  a  0 

al5=i  a  0 

al4=i  a  o 

al3  =  l  e  0 

a6=l  a  0 

a7=l  a  0 

a6=l  a  0 

a5=l  a  0 

b<J  =  0  a  c 

bl=0  a  0 

cln=C  a  0 

ohl2=0  a  o 

Ster  beoins  a  990  ns. 
Dhll=i  p  o 

Step  beoins  P  1015  ns. 
phll=0  a  o 

Steo  beoins  e  1025  ns. 
ohl?=l  a  n 
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state  Is  now: 
Current  tlires  1^50 
clccKssOt-Ol  cln=o  eout=0 
*aea=0blllli l 1111111111 
bbhbcotoooonoooooooooco 
SUmrObCOOOlOOCOOOOlOOOl 

Step  beqlns  9    1050  ns, 
phl2=0  a  o 


Stec  beclns 
Dhl1=l  a  0 


e  10*0  ns. 


Sten  beolns  0  1085  ns. 
phll=0  a  0 

Sten  becins  a  1095  ns. 
phl2=l  9    0 
state  Is  now : 
Current  times  112C 
clcctcs  =  Ob01  cln  =  (*  couts" 
aaaasOfcll  1111111111 11 11 
bbbb  =  Cc000  0  00OOOoo0000() 
SUmsObOCOOlOOOOOOOlOOOl 

Ster  beolns  e  1120  ns, 
Dhi2  =  C  a  r» 

Ster  bealns  a  1130  ns. 
chllsl  a  0 

Step  beolns  9    1155  ns. 
Dhll=0  a  0 

Step  bealns  9    11*5  ns. 

phi2=l  »  0 

sl6=l  e  14.6 

s9=l  e  16.7 

s  1 1  =  1  e  16.7 

Sl5=l  f  16.7 

s7=l  a  16.7 

s3=l  P  1*.7 

Sl4=l  a  16. P 

sl2  =  l  a  16. P 

sl0=l  a  16.8 

«;8=1  a  16.8 

s6cl  a  16.8 

s4=i  a  i*.s 

s2=l  a  17 

state  Is  now: 

Current  tlrre  =  1190 

cloc<s=0b0l  cln=0  cout=o 

aaea  =  Ohlllll  1  l'l  11  11111 1 

bbbb=0b0o00000000000000 

sui"  =  0b01 111111111111111 
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S  t  e  n  bealns  B  1190  ns. 
cin=l  B  0 
Dhl2=0  e  o 

Ster  bealns  B  1200  ns. 
onil=l  e  0 

SteD  bealns  fl  1225  ns. 
nnll=o  a    n 

St en  bealns  B  1235  ns, 
phi2=l  b  o 
state  Is  now: 
Current  tlme  =  1260 
cloc* s=Oh01  cln=l  cout=0 
aaap  =  otllllllllllllll  11 
bbbb=0bOOOo0000O000C000 
sumsObOllllllllllUltll 

Stec  beolns  e  1260  ns. 
oni2=c  e  0 

Ster  bealns  »  127"  ns. 
ohllsi  e  0 

SteD  bealns  B  12Q5  ns, 
pMl  =  o  a  o 

Ster  beolns  a  1305  ns. 
nhi2=l  B  o 
state  Is  now: 
Current  tlme=  1330 
clocKs=0b01  cln=l  cout  =  0 
aaae  =  0fcl H 111111 1111  111 
bbbnsObOOOOOOOOOOOOOOno 
suf=0b01 111111111111111 

Stec  bealns  &  1330  ns. 
nni2=0  B  0 

Ster  berins  B  1340  ns. 
onil=l  '    0 

Stec  bealns  fl  1365  ns. 
phll=0  s  o 


Ster  beolns  B 
ohl2=l  P  o 
SlftsO  B  14.2 
s9=0  B  16.4 
Sll=0  B  16.4 
Sl3=0  B  16.4 
S15=0  fl  16.4 
S7r0  6  16.4 
S5=0  o  16.4 
S3=0  B  16.4 


1375  ns. 
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Sl4  =  0    9    16.5 

Sl2=0    b    16.5 

610=0    a    16.5 

sflsrt    a    16.5 

sfi=0    «    16.5 

S4=0    a    16.5 

s2  =  o    a    16.7 

sl=0    9    20 

cout=l    a    21.1 

state    is    no*: 

Current    times    1400 

clocKs=0b01    cln=l    couts] 

aaeasotllll 1111111111!) 

hbbbsOcOOnooOOOOOOOOOOO 

suir  =  OMnoooonooccooooco 


Ster  beclns 
hl=l  a  C 
cin=0  9    0 
ohl2=0  a  0 


ic    14  0  0    n  s  . 


Ster    beolns    °    M10    ns, 
pnii=i    a   n 

Step  healns  B  1^35  ns. 
pnll=C  a  0 

Ster  bealns  ?  1445  ns. 
phi2=l  b  0 
state  Is  now: 
current  times  1470 
clocks=0b01  ein  =  0  coutsi 
aaaa=Orll  1111111111111) 
brbb=0bO0O000000000O001 
sumsOfclOOnooOOOOG 0000^0 

Stec  bealns  9    1470  ns. 
phl?=0  »    0 

Step  peclns  *    1^90  ns. 
pnll=l  a  0 

Step  bealns  £  1505  ns. 
phll=0  e  0 

Step  beolns  P  1515  ns. 
phl2=l  a  0 
state  Is  nowi 
Current  time=  1540 
cloclcssf'bOl  cln  =  C  cout=l 
aaaa  =  Obllllllllll  111111 
bbbh=0bO00000O0OO000O01 
SUirsOblOOOOOOOOOOOOOOOO 

Step  bealns  a  1540  ns. 
nhi?=0  a  n 
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Step  beains  P.  1550  ns. 
onilsl  a  o 

Step  beains  a  1575  ns. 
onil=0  e  0 

Step  beolns  9    1585  ns. 
phi?=l  e  o 
state  Is  new: 
Current  time?  1610 
cloclcs  =  Ob01  cln  =  0  coutsl 
aaaa*Oblllll lllllllll 11 
bbchsotooonnooooooooooi 

SUm=0blO0OO00000O0O0O00 

Sten  beolns  *  1610  ns. 
bl  =  0  e  0 
ohi7=0  a  0 

Step  beolns  P  1620  ns, 
Dhll=l  a  o 

Step  begins  a  1645  ns. 

DHl1=0  6   0 

Sten  reains  *    1655  ns. 
phl?=l  a  i 
state  Is  now: 
Current  tirre  =  1680 
clocks=0bnl  cin=o  coutsl 
aaae=0bll 11111111111111 
bbbbsObOOOOOOOOOOOOOOOO 
SUmsOblOOO 0  000000000000 


Step  beolns  9    16P0  ns. 

al6=0  a  o 

al5=0  P  0 

al4=n  e  o 

al3=o  p  o 

el2  =  C  a  0 

all=0  e  o 

aioro  a  o 

a<*=0  a  o 

afl=0  a  o 

a7  =  0  a  o 

a6=0  a  o 

a5=0  a  0 

a*  =  0  a  o 

a  3=0  a  c 

a2=o  e  c 

aleO  a  0 

pni2=0  a  o 

Ster  beolns  P  1690  ns, 
philxi  a  n 
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Steo  beolns  P  1715  ns. 
Dhll=0  P  0 

Ster  bealns  P  1"?25  ns. 
chJ2=l  *    0 
state  Is  no«: 
Current  tlrre=  1750 
cloocssOfOl  cln  =  o  coutal 
aaae  =  0fc00C.  000000  0  000000 
bbbb=0bO00O000C)oocoC00 
SUn-sOfclOuOOOCOOOOOOOOOO 

Stec  beolns  f    1750  ns. 

b 1 6=1  a  0 

b15=1  e  0 

bl4=l  P  0 

bl 3=1  B  C 

bl2=l  8  0 

bllsl  p  0 

blOsi  P  o 

b9  =  l  e  0 

b8=l  s  0 

b7=l  a  0 

h6=l  e  0 

b5=l  B  0 

^4=1  ?  0 

b3=l  a  0 

b2=t  P  0 

hl=l  P  0 

ohl2=o  s  o 

Ster  bealns  a  1760  ns. 
rMll=l  p  0 

Steo  bealns  a  17H5  ns. 
onil=0  a  o 


Ste 
chl 
sib 
s9  = 
sll 
sl3 
sl5 
S7  = 
s5  = 

53  = 
s!4 
sl2 
slO 
sfi  = 
s«  = 

54  = 

s?  = 

si* 


^ecins 
=  1    a    o 
1    a    I4.fi 

a    16.7 

1    a    16.7 

1    a    16.7 

1    a    16.7 

a    16.7 

a    lb. 7 

a    16.7 

1    P    16.8 

1    P    16.8 

1    a    16. B 

a    16.8 

a    16.6 

a    16. a 

a   17 

a    19.1 


<?    17Q5    ns 
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cout  =  0  P  7.2,9 
state  i s   now : 
Current  times  la20 
clockssObOl  cln=P  cout=0 
aaaa  =  0b00OOC0OOC0OO0<^0O 
bbbbaOfcllll] 111  1 11 1111 1 
SUffsObOllllll 1111111113 

Step  beolns  P  182^  ns. 
al?  =  l  8  0 
Dni2=o  e  o 

Ster  beolns  ?  1R30  ns. 
nnii=i  9   o 

Ster  beolns  8  1*55  ns. 
Phll=0  8  0 

Stec  beains  9    1965  ns, 

Dhi?=l  8  0 

Sl6rO  E  14.2 

s9  =  0  8  16.4 

SllsO  8  16.4 

Sl3=0  B  16.4 

Sl5r0  8  16.4 

S7=0  8  36.4 

S5s0  P  16.4 

S3=0  «  16." 

Sl4=0  e  16.5 

Sl2=0  P  16.5 

SlO=0  e  16.5 

s8=n  e  H,,5 

S6=C  8  16.5 

S4=0  6  16.5 

S2=0  8  18.7 

Sl=C  a  20 

state  Is  now : 

Current  tirres  1890 

clocks=PbOl  cir  =  o  cout=o 

aaaa=0bCOO0l00OOOOO0O0O 

bbbb=Otl 111111111111111 

SUirrObOOOOOOOOOOCOCCOOO 

Ster  beolns  a  189n  ns. 
bl2=0  a  0 
Dhl2=0  e  0 

Step  beolns  e  1900  ns. 
ohll=l  a  0 

Ster  beolns  9    1925  ns. 
chll=0  e  0 

Ster  beolns  B  1935  rs. 
unl?=l  e    0 
sl6=l  a  14.6 
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s9  = 
sU 

513 

sl5 
s7  = 
s5  = 
s3  = 
sl4 
s!2 
slO 
sfl  = 

S6  = 
54  = 
S?  = 

sl  = 
sta 
Cur 
clo 
aaa 
hhb 

SUff 


a  16.7 
1  9  16.7 
1  0  16.7 
1  »    36.7 

a  16.7 

e  16.7 

B  16.7 
1  3  16. 9 
1  a  16.8 
1  B  16.  P 

S  16.9 

a  16.8 

a  16.8 

P  17 

e  19.1 
e  Is  now : 
ert  times  I960 
Ks=0b01  clnsc  cour=o 
sobooooiooonoooonoo 

sOM  11101  11  11  ill  111 
Ohoim  i  j  liiuiiiii 


Sttr    beains  B  i960  ns, 
cln=l  e  0 
pni2so  b  o 

Stec  renins  P  197  0  ns 
d  n  1 1  =  1  P  o 


Ster  beoins 
nnilso  e  0 


8  1095  ns 


Stec  peclns  fc  200?  ns. 

Dnl2si  e  p 

SlbsO  B  14.2 

Sl3=0  B  16,4 

Sl5sP  B  16.4 

Sl4sC  B  16.5 

Sl2s0  a  16.5 

COUtsl   6  21.1 

state  is  no*: 
Current  timer  2030 
cloc*s=0b01  clnsi  coutsl 
naaesObOOOOlOOOOOOOOOOO 
bbbbsnbl  111011111111111 
sumsObloOOOOl  1111111111 


Ster  beains 
bl6=0  a  0 
bl5=0  0  0 
hl4=0  B  0 
bl3s0  P  0 
bllsO  P  0 
M0  =  0  P  0 
b9sn  e  0 
hB=n  »    C 


0  2030  ns. 
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b7=n    9 

0 

b6so    e 

0 

b5  =  0    a 

0 

b4=0    a 

0 

b3  =  0    a 

0 

b2  =  0    e 

0 

blatO    a 

0 

a  12  =  0    " 

a    n 

pni2=o 

a    0 

Stec  hecins  a  2040  ns, 
Dhilsi  a  o 

S ten  benins  6  2  n  b  5  ns. 
rhil=n  a  o 


Ster  beairs  a  2075  ns. 
phi2=l  a  o 
slb  =  l  a  ii.fi 

16.7 

16.7 

16.  9 

16." 
cout=P  a  22.9 
state  Is  now: 
Current  times  210n 
clocics  =  0fc01  cln  =  l  cout  =  0 
aaaa=0h0000000000000000 
nbbb=ObCOOOOOOOnnoCOOOO 
SUWaObOllllll  11U11  1111 


sn=i  e 

sl5  =  l  a 

si4=i  a 

sl2  =  l  e 


Ster  beolns  a  2100  ns. 
cln=0  a  o 
chl2=0  a  o 

S  t  e  d  beains  a  2110  ns. 
phll=l  a  o 

Stec  beains  a  2135  ns. 
phll=0  P  0 

Step  beains  6  2145  ns. 

phl?=l  P  0 

sl6  =  0  B  14.2 

S9  =  0  9    16.4 

Sll=0  B  16.4 

Sl3=0  B  16.4 

Sl5=0  e  16.4 

S7=0  P  16.4 

S5=0  B  16.4 

s3=0  e  16.4 

Sl4=0  a  16.5 

sl?=0  e  16.5 

Sl0=n  e    16.5 

sfl=P  P  16.5 

s6=0  a  16,5 


99 


n*?c   6  15:23  1984   chic.loa  Paae  16 


S4=0  a  16.5 

s2  =  0  a  16.7 

S1=0  e    20 

cout=l  »  21.1 

state  is  now: 

Current  times  2170 

cloci«cs  =  ObOi  cln  =  0  cout=l 

aaaa=ob0onoocooooooooco 

bfcthsOt^O^OOOOOOOOOOOno 
sum=0blO000000000OCTO00 

Stec  beains  a  2170  ns, 
ohi2=0  a  0 

Stec  beclns  a  21«n  ns. 
dM1  =  1  a  0 

Stec  becins  a  2705  ns, 
rhil=0  c  0 

Ster  beains  f  2215  ns. 

Dhi?=l  9    0 

cout=0  B  22.9 

state  is  now; 

Current  timer  2240 

clocKs=0b0l  cjn=0  cout=0 

aaaasoboooooooooooononn 

cobb=ObOOOOioOOOCOOonoo 

suirsotooooonooooonooooo 

Ster  beains  fl  2240  ns. 
nhl2=0  e  0 

Sten  *ealns  a  2250  ns. 
Dhll=l  P  0 

Ster  beclns  e  2275  ns. 
onil=0  «    0 

Stec  beains  a  22P5  ns. 

chi2=l  »  0 

sl=0  a  20 

state  is  now: 

Current  tlire=  2310 

cloc»cs  =  Ob03  cin  =  0  cout  =  0 

aaaa=0b0OO0OOOOOO00C00O 

bbbb=Ob0COO000OO0OOO0OO 

SUffcObOOOOOOOOOOOOOOOOO 

Stec  beclns  a  2310  ns. 
ohl2=0  a  o 

Ster  becins  a  2320  ns. 
ohll=l  a  o 

Ster  beolns  a  2345  ns. 
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ohil=0    f    0 

Step  beclns  £  7355  ns. 
phl2=t  a  o 
state  Is  now: 
Current  ti^e*  7380 
clcc*s=0b01  cln  =  c  cout  =  n 
aaea=0fc00O0O0000C0C0000 
bbbb=0b0O0O1CO000OOOOC0 
S'.im  =  0b000O0^000n0O000O0 


exit 
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APPEND1I    I 
LAX0U1S 


LEGEND 


>;-.•-.'■ 


Contact  Cut 


p-well 


P+  doping 


polysilicon 


Diffusion 


Metal 
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AND  Gate 
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.  .  r,*v.5  -  -  .....  . -.  ryyv 

m      ■'  m 


::::q: 

ism 


mmmmm 


OR  Gate 
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A   +    B 


XOR  Gate 
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■r.s2 


g^SSISSS 


S^K£ 


BgS^^^^£^S^SY^aa^ -f  »'.♦  jrtNN^-^-NV  XN^y,'  -^-ft^^N^ 


^ipmiiifinl.iiiSiiii  urm  mmm  ■ 


ESSSS 


■yt%-p-,-.-.^ 


Wm   ;.13 


I! 


'.' •.:;:  •' ■:•.•:-.  ;  firTtTOTmTTr" imnninrn  HiMTTOnnininr'n'ni''    ■!!tl'l'||'',ti  ll!l! 


«!l 


T^TJ 


»**&■ 


in 


CON(n) 


CON(n) 


out 


shift   out 
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■■■}  \  ■'■■■'(?&  *$?.ta 


o 

p 


CQ 


CO 

w 

M 


m 

en 
W 


mop 

rip 


j. 


< 

0-, 


CO 

w 


CO 

u 

M 


m 

u 
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m 

m 

*y 

^T 

CM 

n 

<n 

^H 

i— l 

<N 

.— t 

(N 

<N 

co 

CO 

CO 

CO 

to 

CO 

■H 

•H 

•H 

CJ 

U 

U 

w 

u 

U 

hi 

u 

x: 

x: 

x: 

M 

M 

H 

»-i 

H 

H 

M 

H 

a. 

a. 

cu 

^Hyiijjhj: 


^^n 


.,.    |H 


•';•£}•  ■Li- •  u — ta-u-tij  u — y ij 


,r-^....  ..,,/ I    ,-         "*   ■— —       ' 


GND 


"|Lga::::i?S^::-n:ffia:::-::nsa: 


ES4      ES3       ES2       ESI 


PLA84 
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I  N  f — ' 

ClC       ^r     rsr      rorn       cn     i<n      ■— i     »H     -h  «H 

•rl    Im        CO       C/l       Ul      |V1       1(1       W       W       1/1     £  £ 

u    u     w   lu    w    lu     u   Iti    w    Icj    o<  a 


i  o  ij  i  pi  ij  i  i  i  a  g 

■H h ! M H rt u r. H ~ h f-f 

to    fn    rti    ttfi    m    m    ,-ri    r-i    m    ra    t<      - 


¥ 


Vdd 


|  I  1  %  1  ^_.!l_-?Lfe  ^J'1  i: 


j* 


•y-jV  'ps-.  [j.-ij    f,   Ej     :j.E/    :p  tj        ; t|_j    j 

t-:i:3:Jx-.^:.j[vJ:i-.:  ii     H:..  :j     "i     ^  H  -        Q-^"~  ~^ 


fSjp  n.-fi-ihi  :i  ^.i'^-^p^^-^H;'-  ^--l^l;^ 


i — :,'-it  I      ;.•    -  :     :  !.l  :::  ;c> 


fp  '3  1  1  <^  3  ^^^  -J-J  'iNa^r"%.:.  -.  ^_[f  ^JJ    j^Jj    |Tp!J    ■-• 


y^frr.'f '.•":.'  ZBZZB.  ■'•<"•?••>?  g  ■  vv  v  ■ 


GND 


li^::::^y§nT^,:fcJ|-:.-:;;^gT::;j^ 

S4         S3         S2         SI 


PLA104 


109 


iH 

CM 

r. 

^T 

<r 

m 

ro 

<N 

CM 

iH 

iH 

•H 

•H 

•H 

a. 

u 

a. 

o 

&. 

u 

CU 

O 

J= 

x: 

U 

a 

CQ 

0) 

CQ 

ca 

ca 

CQ 

CQ 

a. 

tu 

.c 
ft 


§  m  a  i  a  a  i  §  §  §  n 

(J^l  '     l^-^ig-X,     Wl-^l^ 111 jm tm-y         V. 


Vdd 


|*gA,,»..j,.,ir.„ti.    :i...i   ,.iv^i...; ®53gg^| 


P*^T^ i  .        ' 
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PLA915 
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AP£MDH    F 
TEST    VECTORS 


Addend    A 
msb-    -   -  -   -   1st) 


Addend   B  Cin 

msb-   -   -   -    -   lsb 


Sum 


initialize  all  internal  nodes 


0000000000000000 
0000000000000000 
0000000000000000 


0000000000000000 
0000000000000000 

ooooooooooocoooo 


test  for  proper  P  and  G  primitives 


0000000000000000 
1111  1111111111 11 
0101010101010101 
1010101010101010 

test  fcr  proper  IES 

0001000100010001 
0001000100010001 
0101010101010101 
0101010101010101 

test  fcr  proper  IC23 

0101010101010101 
0010001000100010 


11111111 11 1 11 11 1 

0000000000000000 
1010101010101010 
0101010101010101 


0000000000000000 
0001000100010001 
0001000100010001 
0101010101010101 


00110011001  1001  1 
00110011001  10011 


test    for  carry    from    block    to   blcck 


00000000000011 11 
00000000000011 11 
00000000111111 11 


0000000000000001 
0000000000000000 

cooooooooooooooo 


msb- 


lsb 


0   xxxxxxxxxxxxxxxxx 
0   xxxxxxxxxxxxxxxxx 

0    00000000000000000 


0  01111 1  1  1111  111  11  1 

0  01111111111111111 

0  01111111111111111 

0  01111111111111111 


0  00001000100010001 

0  00010001000100010 

0  00110011001100110 

0  01010101010101010 


0    01000100010001000 
0    00101010101010101 


0  00000000000010000 

1  00000000000010000 
1  00000000100000000 


11 1 


000000001  1111111 

0000000011  111111 

0000  111111111111 

000011  1111111111 

0000  111111111111 

000011  1111111111 

1111111111111111 

1111111111111111 

1111111111111111 

1111111111111111 

111 1111111111111 

0000000000000001 

0 

0000000000010000 

0 

0000000000000000 

1 

0000000000000001 

0 

0000000000010000 

0 

00000001000COOOO 

0 

0000000000000000 

1 

0000000000000001 

0 

0000000000010000 

0 

0000000100000000 

0 

00O1000O000C0000 

0 

00000000100000000 
0000000010000111 1 
0  0001000000000000 
00001000000000000 
00001000000001 11 1 
00001000011  111111 

10000000000000  000 
10000000000000000 

10000000000001  111 
10000000011  111111 
1000011  1111111111 
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